This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2022-10-26
Channels
- # announcements (17)
- # babashka (68)
- # beginners (8)
- # biff (14)
- # calva (25)
- # cherry (10)
- # clj-kondo (1)
- # clj-on-windows (12)
- # cljsrn (6)
- # clojure (134)
- # clojure-berlin (1)
- # clojure-europe (33)
- # clojure-nl (4)
- # clojure-norway (6)
- # clojure-uk (10)
- # clojurescript (9)
- # datalevin (8)
- # datomic (34)
- # docker (1)
- # emacs (31)
- # fulcro (6)
- # honeysql (8)
- # java (7)
- # joyride (14)
- # kaocha (7)
- # malli (11)
- # nbb (4)
- # off-topic (11)
- # pedestal (14)
- # rdf (53)
- # re-frame (6)
- # reagent (39)
- # reitit (2)
- # releases (9)
- # rewrite-clj (14)
- # shadow-cljs (97)
- # specter (1)
- # testing (5)
- # tools-deps (12)
- # vim (4)
- # xtdb (9)
Hi all. What would the best tooling be to load up a large number of OWL expressions (OWL functional syntax) in order to run reasoning across them? In health and care we have SNOMED CT which is an ontology. SNOMED CT contains an expression reference set that provides a number of axioms for each concept. I understand that there are a number of different reasoners available that one can, it appears, plug-in to standard OWL libraries such as ELK and Snorocket. As you can probably guess, I am very much a beginner when it comes to semantic data, so apologies for such a basic question.
A related question is... do you have any recommendations on books that are relatively current, and a good introduction to RDF and OWL?
[ This is to extend https://github.com/wardle/hermes so I can do better reasoning over more complex expressions recorded in healthcare software. I can do naive reasoning using single concepts, but that gets very complicated if one tries to reason over more complex expressions. ]
I'd like to implement something to support what is documented at https://confluence.ihtsdotools.org/display/DOCOWL/2.4.+Content+for+the+OWL+Axiom+Refset My expectation was that I could load each axiom into 'something' and then run reasoning as if by magic, but that might be naive š
Unfortunately, SNOMED is entirely defined as a TBox, which makes every reasoner Iāve run on it die a horrible death. Also unfortunate is that OWL functional syntax cannot be loaded directly by the systems Iāve tried. The latter problem is OK though, as there is a tool to extract the functional syntax into a .owl file, and then you can use a tool like Robot to convert it to .ttl
The reasoning can be done with rulesā¦ sort of. However, all the reasoning actually just comes down to the subsumption hierarchy (rdfs:subClassOf) and thatās actually in one of the files already! Bad news is that the files are all in tab-separated-value format, and you need to extract it into rdf manually
The subclass relationships are all in the āRelationshipā file. For instance, in the September International release, that can be found in:
Snapshot/Terminology/sct2_Relationship_Snapshot_INT_20220930.txt
You need to filter by the active rows, meaning that column 2 is 1
(Iām doing 0 indexing on the columns, so id
is column 0)
Youāre looking for any rows with a typeId
(column 7) set to 116680003
. That means āIS_Aā in Snomed
Thanks @U051N6TTC - that's the approach I've used so far - but the OWL data is now also used to supply concrete values for some attributes - e.g. the dosage in numeric units of drugs in a specific product - that are only available in OWL and not in the relationships file. My tooling already handles transitive relationships, which of course includes is-a but also other relationship types such as site or pathology. I was hoping to use OWL tooling to combine these with other sources of information for inference - and to be able to do potentially more complex inference using concepts with refinements (SNOMED expressions), but perhaps it isn't worth it.
Well, the is-a relationships actually contain all of the inferred data. All the stated data is already in the OWL file. The other reasoning (e.g. intersection between disorder, site, etc), is all precalculated and thatās what appears in the Relationship file
as for more complex inferencesā¦ maybe thereās stuff in there? But they figured out a lot already. And most of the inference is done in the is-a space
theyāve misused things like āis-aā pretty badly. So finger is-a hand, and hand is-a arm
(Weāve had long discussions with the doctors at work about thisā¦ some of them contribute to Snomed)
Indeed - our UK drugs have the same problem.
Part of the issue, IMO, is that Snomed is pure TBox, with 2 outcomes: ā¢ Itās too large to effectively reason over (you can, but it needs limited reasoning) ā¢ Most pure TBox reasoning is determining subsumption, rather than anything else: role subsumption and class subsumption.
I donāt think thereās much in the way of role-subsumption going on, but an enormous amount of class subsumption
Iāve been looking at running rules to calculate them myself, but the intersections are tricky to do efficiently.
When processing SNOMED in context of a wider information model, or as part of an expression, one needs to normalise and then potentially run a DL classifier to test subsumption, and I was hoping I could use existing OWL tooling to do that, and I'm not aware the new concrete types are available outside of the OWL reference set. But it sounds as if it isn't going to be easy to do this - which is helpful to understand in itself! I might stick with my current approach then! It makes sense why I've not seen many implementations that do any differently to what I've done in Hermes [ and all my other prior SNOMED implementations].
Thanks for your advice. Really helpful.
Well, Iām more of an OWL expert than a SNOMED expert. Iām still learning the latter. But itās frustrating for me, because of the choices theyāve made
And I thought I was going mad not being able to see how to easily import OWL functional syntax into tools... so hearing that you had the same issue is... re-assuring!
Because I had tried using OWLAPI and got very confused.... š
The snomed-owl-toolkit libraries for working with it is easy, but I couldnāt find command line tools. I finally used https://github.com/andrewdbate/OWLSyntaxConverter that just trivially calls the library to load the owl, and then save the ttl, but that seemed wrong. Then one of my colleagues pointed me at Robot, and that seems more standard, but it doesnāt use the latest version of snomed-owl-toolkit
If you donāt know it, http://robot.obolibrary.org/ is from the OBO project (which is why I missed that it does RDF and OWL)
Thanks. I will have a look. The other issue I have found is that it is sometimes not clear whether specific tools or data are for authoring or for operational use.
I would definitely have skipped over that on first glance!
Yes! But my colleague worked on it. I mentioned to her that there isnāt anything suggesting that I can use it to convert OWL to TTL, and she said she would pass it along
Look at the http://robot.obolibrary.org/convert command
Oh, and in case you can get a reasoning engine to work over SNOMED, youāll need to create prototypical objects that are instances of each class, since reasoners will usually remove the TBox from query results, and of course SNOMED is entirely TBox
Ahhh robot looks great. I could use as a java library to fly through all of the set-up and axioms and output to something else....
So step 1 for us was creating a Turtle file full of:
our-domain:_10000006 a snomed:10000006
and we can now make statements about the object our-domain:_10000006
I've seen reports of ELK and Snorocket being used.... but now I don't know whether it will help as much as I thought it would. Super helpful though. Thank you.
Yes that's exactly what I was hoping to do - my tooling works for SNOMED but it would be nice to reason independently - that's what I was trying to state earlier.
So a patient with a family history of Huntington's disease... I can reason in the context of the wider information model, but make use of the structures in SNOMED.
Ah ok... that's disappointing to hear.
Weāre mostly using pre-reasoned SNOMED to make connections for us. We just need to link to the instances that I mentioned
and Iāve pre-generated all the transitive statements with things like:
insert { graph our-domain:transitive {?s rdfs:subClassOf ?sup } } using sct:900000000000207008 using our-domain:snomed-inferred where { ?s rdfs:subClassOf+ ?sup minus { ?s rdfs:subClassOf ?sup } }
that way I can use an extra graph in the FROM clauses and I get transitive closures, with queries returning in <1sec instead of 40 sec š
Ahh that makes sense.
I benefit from the speed of lmdb and lucene in https://github.com/wardle/hermes with hand-optimised serialisation so I didn't end up needing to cache transitive closure tables for any type of relationship, even 'is-a'.
I donāt know if this is of use to you @mark354 but RDFox claims support for OWL 2 functional syntax. Iāve never used it myself, and I donāt do much with OWL - beyond a passing interest in it and reasoning/rule-engines and logic systems. However IIRC they have made big claims about their reasoning performance in the past.
https://www.oxfordsemantic.tech/ and docs here: https://docs.oxfordsemantic.tech/introduction.html
Thanks @rickmoynihan I will have a look.
I can say at least for querying it's an order of magnitude faster than stardog or graphdb. I just ran the explore task of the berlin sparql benchmark with up to 4 billion triples. Takes gobs of memory, though: 4 billion needed 512Gb of RAM š
Did you test with stardog running its indexes off a ramdisk?
Nope, it ran just off fast NVMes. Thanks for the suggestion, I have not yet much experience with stardog.
They dropped explicit support for ramdisks in stardog 7 because their disk based storage was where all their optimisation efforts were focussed; and instead started recommending people do this: https://docs.stardog.com/additional-resources/migration-guide#memory-databases We donāt do this ourselves though so I donāt know what difference it makes in practice.
That explains why they didn't recommend indices on ramdisks when I asked for optimization optionsā¦ I'm now testing their icv capabilities & might need to do more optimizations, my customer is running >600 validation queries on each insert. Thanks again.