Fork me on GitHub
#asami
<
2022-11-03
>
Mattias09:11:11

Throwing this out there - if I wanted to put Snomed into Asami, what’s a sensible approach? Fwiw, I’m after a fairly shallow variant, pretty much active concepts with descriptions and is-a-relations - I don’t need to evaluate logic or do anything fancy other than search concepts for IDs or names/descriptions.

Mattias09:11:37

First I was thinking reading the files, putting together a concept as a map to transact into Asami. But now I don’t know, should I let each file be it’s own thing in Asami (but that feels more like how you would do in relational-land…)… any tips or pointers? 😋

quoll10:11:17

I’m actually planning to do this myself (caught up in so much that I haven’t worked on it lately). To do it, I’ve been building a TTL parser. I was trying to build a fast one from a state machine, but that was getting complex, and would take a lot longer, so I’ve switched tack and have started parsing TTL with Instaparse. Basically, the issue it converting Snomed into a format that Asami can load. Right now, I can do that with Python, but THAT’S no fun, and I might as well stick to Stardog. Which is why I’m working on the parser

quoll10:11:27

As for reading the files, and building the concepts myself… no. I have no desire to do that. From what I can see, most of what is relevant is already available if you extract the OWL and convert to TTL. The only significant thing that’s missing is the inferred types, which comes from the Relationship file, and I already have something that extracts that into rdfs:subClassOf statements

Mattias10:11:29

That’s fascinating and complex and an interestingly explorative way of doing it! 😄

quoll10:11:41

I mean… I could test it by running some Python and emitting triples in edn format. I ought to be able to knock that out relatively quickly, I think

Mattias10:11:27

“If you extract the OWL”… I know some parts (?) of Snomed are distributed as OWL, but what do you mean there?

Mattias10:11:41

Or is that the TTL parsing thing?

Mattias10:11:01

Ah, TTL is short for Turtle? 😛

quoll10:11:26

Yes. The Terse Triples Language

quoll10:11:55

As for the extraction, you can extract the OWL to a functional syntax file using the https://github.com/IHTSDO/snomed-owl-toolkit This creates a .owl functional syntax file, which still isn’t RDF. 🙂 The next step is to convert that to TTL. There are official libraries for that, but no official uses of that library :face_with_rolling_eyes: One project for that is Robot, which has a http://robot.obolibrary.org/convert.

quoll11:11:42

Personally, I prefer to use a small tool by Andrew Bate called https://github.com/andrewdbate/OWLSyntaxConverter. It’s a very light wrapper around the same library that Robot uses, but it’s been released on a more recent version of the library, so the generated turtle is just a little bit nicer 🙂

quoll11:11:59

The first step is something like (depending on your paths, etc):

java -Xms4g -jar libdir/snomed-owl-toolkit*executable.jar -rf2-to-owl -rf2-snapshot-archives SNOMEDIntEdition20220930.zip
The second step is:
java -jar libdir/owlconvert.jar turtle ontology-2022-03-11_07-04-35.owl > snomed.ttl

quoll11:11:16

So I guess the third step is to parse it, then emit it in edn 😄

quoll11:11:58

Or you can wait for me to get over jetlag and return to the parser! :rolling_on_the_floor_laughing: (I’ve also been distracted by clormat)

🙏 1
Mattias12:11:38

Somehow, it’s 2022 and where we should be considering interstellar exploration we’re still parsing textfiles like it’s 1982 or something. I jest (well…), but you do an amazing job. Keep it up, at one point, there will be a generation for which these will be solved problems… 😁 I’ll explore different paths and keep track of your progress with interest! 😄

💖 1