rdf 2022-10-26 | Slack Archive

Mark Wardle07:10:53

Hi all. What would the best tooling be to load up a large number of OWL expressions (OWL functional syntax) in order to run reasoning across them? In health and care we have SNOMED CT which is an ontology. SNOMED CT contains an expression reference set that provides a number of axioms for each concept. I understand that there are a number of different reasoners available that one can, it appears, plug-in to standard OWL libraries such as ELK and Snorocket. As you can probably guess, I am very much a beginner when it comes to semantic data, so apologies for such a basic question.

Mark Wardle07:10:28

A related question is... do you have any recommendations on books that are relatively current, and a good introduction to RDF and OWL?

Mark Wardle07:10:24

[ This is to extend https://github.com/wardle/hermes so I can do better reasoning over more complex expressions recorded in healthcare software. I can do naive reasoning using single concepts, but that gets very complicated if one tries to reason over more complex expressions. ]

Mark Wardle07:10:04

I'd like to implement something to support what is documented at https://confluence.ihtsdotools.org/display/DOCOWL/2.4.+Content+for+the+OWL+Axiom+Refset My expectation was that I could load each axiom into 'something' and then run reasoning as if by magic, but that might be naive 😉

quoll10:10:37

Unfortunately, SNOMED is entirely defined as a TBox, which makes every reasoner I’ve run on it die a horrible death. Also unfortunate is that OWL functional syntax cannot be loaded directly by the systems I’ve tried. The latter problem is OK though, as there is a tool to extract the functional syntax into a .owl file, and then you can use a tool like Robot to convert it to .ttl

quoll10:10:01

The reasoning can be done with rules… sort of. However, all the reasoning actually just comes down to the subsumption hierarchy (rdfs:subClassOf) and that’s actually in one of the files already! Bad news is that the files are all in tab-separated-value format, and you need to extract it into rdf manually

quoll11:10:05

The subclass relationships are all in the “Relationship” file. For instance, in the September International release, that can be found in: Snapshot/Terminology/sct2_Relationship_Snapshot_INT_20220930.txt

quoll11:10:57

You need to filter by the active rows, meaning that column 2 is 1 (I’m doing 0 indexing on the columns, so id is column 0)

quoll11:10:31

You’re looking for any rows with a typeId (column 7) set to 116680003. That means “IS_A” in Snomed

Mark Wardle11:10:20

Thanks @U051N6TTC - that's the approach I've used so far - but the OWL data is now also used to supply concrete values for some attributes - e.g. the dosage in numeric units of drugs in a specific product - that are only available in OWL and not in the relationships file. My tooling already handles transitive relationships, which of course includes is-a but also other relationship types such as site or pathology. I was hoping to use OWL tooling to combine these with other sources of information for inference - and to be able to do potentially more complex inference using concepts with refinements (SNOMED expressions), but perhaps it isn't worth it.

quoll11:10:38

Well, the is-a relationships actually contain all of the inferred data. All the stated data is already in the OWL file. The other reasoning (e.g. intersection between disorder, site, etc), is all precalculated and that’s what appears in the Relationship file

quoll11:10:05

as for more complex inferences… maybe there’s stuff in there? But they figured out a lot already. And most of the inference is done in the is-a space

quoll11:10:05

they’ve misused things like “is-a” pretty badly. So finger is-a hand, and hand is-a arm

quoll11:10:55

(We’ve had long discussions with the doctors at work about this… some of them contribute to Snomed)

Mark Wardle11:10:05

Indeed - our UK drugs have the same problem.

quoll11:10:19

Part of the issue, IMO, is that Snomed is pure TBox, with 2 outcomes: • It’s too large to effectively reason over (you can, but it needs limited reasoning) • Most pure TBox reasoning is determining subsumption, rather than anything else: role subsumption and class subsumption.

quoll11:10:49

I don’t think there’s much in the way of role-subsumption going on, but an enormous amount of class subsumption

quoll11:10:03

(ie. all those is-a relationships)

quoll11:10:04

I’ve been looking at running rules to calculate them myself, but the intersections are tricky to do efficiently.

Mark Wardle11:10:36

When processing SNOMED in context of a wider information model, or as part of an expression, one needs to normalise and then potentially run a DL classifier to test subsumption, and I was hoping I could use existing OWL tooling to do that, and I'm not aware the new concrete types are available outside of the OWL reference set. But it sounds as if it isn't going to be easy to do this - which is helpful to understand in itself! I might stick with my current approach then! It makes sense why I've not seen many implementations that do any differently to what I've done in Hermes [ and all my other prior SNOMED implementations].

Mark Wardle11:10:44

Thanks for your advice. Really helpful.

quoll11:10:44

Well, I’m more of an OWL expert than a SNOMED expert. I’m still learning the latter. But it’s frustrating for me, because of the choices they’ve made

Mark Wardle11:10:15

And I thought I was going mad not being able to see how to easily import OWL functional syntax into tools... so hearing that you had the same issue is... re-assuring!

Mark Wardle11:10:23

Because I had tried using OWLAPI and got very confused.... 🙂

quoll11:10:09

The snomed-owl-toolkit libraries for working with it is easy, but I couldn’t find command line tools. I finally used https://github.com/andrewdbate/OWLSyntaxConverter that just trivially calls the library to load the owl, and then save the ttl, but that seemed wrong. Then one of my colleagues pointed me at Robot, and that seems more standard, but it doesn’t use the latest version of snomed-owl-toolkit

quoll11:10:24

If you don’t know it, http://robot.obolibrary.org/ is from the OBO project (which is why I missed that it does RDF and OWL)

👍 1

Mark Wardle11:10:51

Thanks. I will have a look. The other issue I have found is that it is sometimes not clear whether specific tools or data are for authoring or for operational use.

Mark Wardle11:10:38

I would definitely have skipped over that on first glance!

quoll11:10:33

Yes! But my colleague worked on it. I mentioned to her that there isn’t anything suggesting that I can use it to convert OWL to TTL, and she said she would pass it along

quoll11:10:18

Look at the http://robot.obolibrary.org/convert command

quoll11:10:34

I’d have saved myself so much time if I’d known about that

quoll11:10:49

Oh, and in case you can get a reasoning engine to work over SNOMED, you’ll need to create prototypical objects that are instances of each class, since reasoners will usually remove the TBox from query results, and of course SNOMED is entirely TBox

Mark Wardle11:10:00

Ahhh robot looks great. I could use as a java library to fly through all of the set-up and axioms and output to something else....

quoll11:10:05

So step 1 for us was creating a Turtle file full of: our-domain:_10000006 a snomed:10000006 and we can now make statements about the object our-domain:_10000006

Mark Wardle11:10:07

I've seen reports of ELK and Snorocket being used.... but now I don't know whether it will help as much as I thought it would. Super helpful though. Thank you.

Mark Wardle11:10:25

Yes that's exactly what I was hoping to do - my tooling works for SNOMED but it would be nice to reason independently - that's what I was trying to state earlier.

quoll11:10:10

Same. But it’s just not been working out the way I’d hoped

Mark Wardle11:10:11

So a patient with a family history of Huntington's disease... I can reason in the context of the wider information model, but make use of the structures in SNOMED.

Mark Wardle11:10:24

Ah ok... that's disappointing to hear.

quoll11:10:02

We’re mostly using pre-reasoned SNOMED to make connections for us. We just need to link to the instances that I mentioned

quoll11:10:14

So then it becomes SPARQL queries to traverse the graph

quoll11:10:48

and I’ve pre-generated all the transitive statements with things like:

insert { graph our-domain:transitive {?s rdfs:subClassOf ?sup } } using sct:900000000000207008 using our-domain:snomed-inferred where { ?s rdfs:subClassOf+ ?sup minus { ?s rdfs:subClassOf ?sup } }

quoll11:10:42

that way I can use an extra graph in the FROM clauses and I get transitive closures, with queries returning in <1sec instead of 40 sec 🙂

Mark Wardle11:10:54

Ahh that makes sense.

Mark Wardle14:10:32

I benefit from the speed of lmdb and lucene in https://github.com/wardle/hermes with hand-optimised serialisation so I didn't end up needing to cache transitive closure tables for any type of relationship, even 'is-a'.

rickmoynihan12:10:06

I don’t know if this is of use to you @mark354 but RDFox claims support for OWL 2 functional syntax. I’ve never used it myself, and I don’t do much with OWL - beyond a passing interest in it and reasoning/rule-engines and logic systems. However IIRC they have made big claims about their reasoning performance in the past.

rickmoynihan12:10:32

https://www.oxfordsemantic.tech/ and docs here: https://docs.oxfordsemantic.tech/introduction.html

Mark Wardle14:10:59

Thanks @rickmoynihan I will have a look.

Mathias Picker18:01:51

I can say at least for querying it's an order of magnitude faster than stardog or graphdb. I just ran the explore task of the berlin sparql benchmark with up to 4 billion triples. Takes gobs of memory, though: 4 billion needed 512Gb of RAM 🙂

rickmoynihan09:01:43

Did you test with stardog running its indexes off a ramdisk?

Mathias Picker10:01:11

Nope, it ran just off fast NVMes. Thanks for the suggestion, I have not yet much experience with stardog.

rickmoynihan10:01:01

They dropped explicit support for ramdisks in stardog 7 because their disk based storage was where all their optimisation efforts were focussed; and instead started recommending people do this: https://docs.stardog.com/additional-resources/migration-guide#memory-databases We don’t do this ourselves though so I don’t know what difference it makes in practice.

Mathias Picker10:01:37

That explains why they didn't recommend indices on ramdisks when I asked for optimization options… I'm now testing their icv capabilities & might need to do more optimizations, my customer is running >600 validation queries on each insert. Thanks again.

2022-10-26

Channels