rdf

Mark Wardle 2022-10-26T07:44:53.077069Z

Hi all. What would the best tooling be to load up a large number of OWL expressions (OWL functional syntax) in order to run reasoning across them? In health and care we have SNOMED CT which is an ontology. SNOMED CT contains an expression reference set that provides a number of axioms for each concept. I understand that there are a number of different reasoners available that one can, it appears, plug-in to standard OWL libraries such as ELK and Snorocket. As you can probably guess, I am very much a beginner when it comes to semantic data, so apologies for such a basic question.

Mark Wardle 2022-10-26T07:45:28.669919Z

A related question is... do you have any recommendations on books that are relatively current, and a good introduction to RDF and OWL?

Mark Wardle 2022-10-26T07:47:24.050509Z

[ This is to extend https://github.com/wardle/hermes so I can do better reasoning over more complex expressions recorded in healthcare software. I can do naive reasoning using single concepts, but that gets very complicated if one tries to reason over more complex expressions. ]

Mark Wardle 2022-10-26T07:52:04.269489Z

I'd like to implement something to support what is documented at https://confluence.ihtsdotools.org/display/DOCOWL/2.4.+Content+for+the+OWL+Axiom+Refset My expectation was that I could load each axiom into 'something' and then run reasoning as if by magic, but that might be naive šŸ˜‰

quoll 2022-10-26T10:14:37.394489Z

Unfortunately, SNOMED is entirely defined as a TBox, which makes every reasoner I’ve run on it die a horrible death. Also unfortunate is that OWL functional syntax cannot be loaded directly by the systems I’ve tried. The latter problem is OK though, as there is a tool to extract the functional syntax into a .owl file, and then you can use a tool like Robot to convert it to .ttl

quoll 2022-10-26T10:18:01.859519Z

The reasoning can be done with rules… sort of. However, all the reasoning actually just comes down to the subsumption hierarchy (rdfs:subClassOf) and that’s actually in one of the files already! Bad news is that the files are all in tab-separated-value format, and you need to extract it into rdf manually

quoll 2022-10-26T11:05:05.973479Z

The subclass relationships are all in the ā€œRelationshipā€ file. For instance, in the September International release, that can be found in: Snapshot/Terminology/sct2_Relationship_Snapshot_INT_20220930.txt

quoll 2022-10-26T11:08:57.836479Z

You need to filter by the active rows, meaning that column 2 is 1 (I’m doing 0 indexing on the columns, so id is column 0)

quoll 2022-10-26T11:13:31.995389Z

You’re looking for any rows with a typeId (column 7) set to 116680003. That means ā€œIS_Aā€ in Snomed

Mark Wardle 2022-10-26T11:14:20.918869Z

Thanks @quoll - that's the approach I've used so far - but the OWL data is now also used to supply concrete values for some attributes - e.g. the dosage in numeric units of drugs in a specific product - that are only available in OWL and not in the relationships file. My tooling already handles transitive relationships, which of course includes is-a but also other relationship types such as site or pathology. I was hoping to use OWL tooling to combine these with other sources of information for inference - and to be able to do potentially more complex inference using concepts with refinements (SNOMED expressions), but perhaps it isn't worth it.

quoll 2022-10-26T11:16:38.852979Z

Well, the is-a relationships actually contain all of the inferred data. All the stated data is already in the OWL file. The other reasoning (e.g. intersection between disorder, site, etc), is all precalculated and that’s what appears in the Relationship file

quoll 2022-10-26T11:18:05.668639Z

as for more complex inferences… maybe there’s stuff in there? But they figured out a lot already. And most of the inference is done in the is-a space

quoll 2022-10-26T11:19:05.660189Z

they’ve misused things like ā€œis-aā€ pretty badly. So finger is-a hand, and hand is-a arm

quoll 2022-10-26T11:19:55.758269Z

(We’ve had long discussions with the doctors at work about this… some of them contribute to Snomed)

Mark Wardle 2022-10-26T11:26:05.617519Z

Indeed - our UK drugs have the same problem.

quoll 2022-10-26T11:29:19.368599Z

Part of the issue, IMO, is that Snomed is pure TBox, with 2 outcomes: • It’s too large to effectively reason over (you can, but it needs limited reasoning) • Most pure TBox reasoning is determining subsumption, rather than anything else: role subsumption and class subsumption.

quoll 2022-10-26T11:29:49.677079Z

I don’t think there’s much in the way of role-subsumption going on, but an enormous amount of class subsumption

quoll 2022-10-26T11:30:03.071029Z

(ie. all those is-a relationships)

quoll 2022-10-26T11:31:04.117959Z

I’ve been looking at running rules to calculate them myself, but the intersections are tricky to do efficiently.

Mark Wardle 2022-10-26T11:31:36.051259Z

When processing SNOMED in context of a wider information model, or as part of an expression, one needs to normalise and then potentially run a DL classifier to test subsumption, and I was hoping I could use existing OWL tooling to do that, and I'm not aware the new concrete types are available outside of the OWL reference set. But it sounds as if it isn't going to be easy to do this - which is helpful to understand in itself! I might stick with my current approach then! It makes sense why I've not seen many implementations that do any differently to what I've done in Hermes [ and all my other prior SNOMED implementations].

Mark Wardle 2022-10-26T11:31:44.648229Z

Thanks for your advice. Really helpful.

quoll 2022-10-26T11:32:44.602949Z

Well, I’m more of an OWL expert than a SNOMED expert. I’m still learning the latter. But it’s frustrating for me, because of the choices they’ve made

Mark Wardle 2022-10-26T11:33:15.236689Z

And I thought I was going mad not being able to see how to easily import OWL functional syntax into tools... so hearing that you had the same issue is... re-assuring!

Mark Wardle 2022-10-26T11:34:23.547699Z

Because I had tried using OWLAPI and got very confused.... šŸ™‚

quoll 2022-10-26T11:36:09.452589Z

The snomed-owl-toolkit libraries for working with it is easy, but I couldn’t find command line tools. I finally used https://github.com/andrewdbate/OWLSyntaxConverter that just trivially calls the library to load the owl, and then save the ttl, but that seemed wrong. Then one of my colleagues pointed me at Robot, and that seems more standard, but it doesn’t use the latest version of snomed-owl-toolkit

quoll 2022-10-26T11:37:24.101409Z

If you don’t know it, http://robot.obolibrary.org/ is from the OBO project (which is why I missed that it does RDF and OWL)

šŸ‘ 1
Mark Wardle 2022-10-26T11:37:51.488109Z

Thanks. I will have a look. The other issue I have found is that it is sometimes not clear whether specific tools or data are for authoring or for operational use.

Mark Wardle 2022-10-26T11:38:38.246629Z

I would definitely have skipped over that on first glance!

quoll 2022-10-26T11:39:33.273259Z

Yes! But my colleague worked on it. I mentioned to her that there isn’t anything suggesting that I can use it to convert OWL to TTL, and she said she would pass it along

quoll 2022-10-26T11:40:18.839869Z

Look at the http://robot.obolibrary.org/convert command

quoll 2022-10-26T11:40:34.924459Z

I’d have saved myself so much time if I’d known about that

quoll 2022-10-26T11:41:49.897999Z

Oh, and in case you can get a reasoning engine to work over SNOMED, you’ll need to create prototypical objects that are instances of each class, since reasoners will usually remove the TBox from query results, and of course SNOMED is entirely TBox

Mark Wardle 2022-10-26T11:43:00.772799Z

Ahhh robot looks great. I could use as a java library to fly through all of the set-up and axioms and output to something else....

quoll 2022-10-26T11:44:05.441429Z

So step 1 for us was creating a Turtle file full of: our-domain:_10000006 a snomed:10000006 and we can now make statements about the object our-domain:_10000006

Mark Wardle 2022-10-26T11:44:07.582359Z

I've seen reports of ELK and Snorocket being used.... but now I don't know whether it will help as much as I thought it would. Super helpful though. Thank you.

Mark Wardle 2022-10-26T11:45:25.978489Z

Yes that's exactly what I was hoping to do - my tooling works for SNOMED but it would be nice to reason independently - that's what I was trying to state earlier.

quoll 2022-10-26T11:46:10.537379Z

Same. But it’s just not been working out the way I’d hoped

Mark Wardle 2022-10-26T11:46:11.645869Z

So a patient with a family history of Huntington's disease... I can reason in the context of the wider information model, but make use of the structures in SNOMED.

Mark Wardle 2022-10-26T11:46:24.125079Z

Ah ok... that's disappointing to hear.

quoll 2022-10-26T11:47:02.113599Z

We’re mostly using pre-reasoned SNOMED to make connections for us. We just need to link to the instances that I mentioned

quoll 2022-10-26T11:47:14.579319Z

So then it becomes SPARQL queries to traverse the graph

quoll 2022-10-26T11:48:48.122929Z

and I’ve pre-generated all the transitive statements with things like:

insert { graph our-domain:transitive {?s rdfs:subClassOf ?sup } } using sct:900000000000207008 using our-domain:snomed-inferred where { ?s rdfs:subClassOf+ ?sup minus { ?s rdfs:subClassOf ?sup } }

quoll 2022-10-26T11:49:42.254769Z

that way I can use an extra graph in the FROM clauses and I get transitive closures, with queries returning in <1sec instead of 40 sec šŸ™‚

Mark Wardle 2022-10-26T11:50:54.890239Z

Ahh that makes sense.

Mark Wardle 2022-10-26T14:48:32.816359Z

I benefit from the speed of lmdb and lucene in https://github.com/wardle/hermes with hand-optimised serialisation so I didn't end up needing to cache transitive closure tables for any type of relationship, even 'is-a'.

2022-10-26T12:41:06.352189Z

I don’t know if this is of use to you @mark354 but RDFox claims support for OWL 2 functional syntax. I’ve never used it myself, and I don’t do much with OWL - beyond a passing interest in it and reasoning/rule-engines and logic systems. However IIRC they have made big claims about their reasoning performance in the past.

Mark Wardle 2022-10-26T14:07:59.369009Z

Thanks @rickmoynihan I will have a look.

Mathias Picker 2023-01-19T18:23:51.684879Z

I can say at least for querying it's an order of magnitude faster than stardog or graphdb. I just ran the explore task of the berlin sparql benchmark with up to 4 billion triples. Takes gobs of memory, though: 4 billion needed 512Gb of RAM šŸ™‚

2023-01-20T09:38:43.437559Z

Did you test with stardog running its indexes off a ramdisk?

Mathias Picker 2023-01-20T10:13:11.979419Z

Nope, it ran just off fast NVMes. Thanks for the suggestion, I have not yet much experience with stardog.

2023-01-20T10:27:01.166199Z

They dropped explicit support for ramdisks in stardog 7 because their disk based storage was where all their optimisation efforts were focussed; and instead started recommending people do this: https://docs.stardog.com/additional-resources/migration-guide#memory-databases We don’t do this ourselves though so I don’t know what difference it makes in practice.

Mathias Picker 2023-01-20T10:35:37.564759Z

That explains why they didn't recommend indices on ramdisks when I asked for optimization options… I'm now testing their icv capabilities & might need to do more optimizations, my customer is running >600 validation queries on each insert. Thanks again.