Returning to the topic of Luke’s talk; RDF ∩ LLMs intuitively the arguments for why augmenting LLMs with RDF in particular is promising. A colleague’s been implementing RAG and had said they’d tried https://github.com/microsoft/graphrag (I think with neo4j though) and found it to be better than using just a vector store. I need to look at the details of what they’re doing exactly, but can understand why theoretically that may be the case. I also mentioned @luke’s talk and why RDF may be a better foundation than neo4j; but in practice I think aside from being able to use the reasoner to calculate logical implications (which will still require special attention) I’m curious if anyone has found any quantitative research that can show it is better at particular tasks. I think we need evaluations to truly prove the benefits. Are there any benchmarks people are using for this, I know about RAGAS and CRAG, are there others? I completely agree with the theoretical arguments about interpretability; and that RDF being at the intersection of language, data, logic (and computation) is a compelling argument but has this been convincingly proven/demonstrated anywhere yet?
BTW @rickmoynihan… exams are over. I don't know if you'll have any free time in December, but if not, we can catch up in January?
🎉 congratulations that must be a relief! Will DM you to see if we can arrange something
Well, I'm anxious because I don't know how I did for one of them
Yeah, I’ve had similar experiences on the generation and reading side. Though for me most of that success on generation is in repeating boiler plate conversions. The thing is I don’t know if it’s objectively better at it than with some tables and a simple schema specified in something else, e.g. SQL or json-schema. I see there being two (to my knowledge untested) positions to this argument 1) RDF will be worse than alternative X because there is more data in alternative X format to train on. 2) RDF will be better than alternative X because it has deep foundations in logic and language, and that LLMs have read all that too, and also those patterns are innate in human language so RDF be able to generalise/scaffold/bridge these worlds better. There’s a big debate at the moment about whether LLMs and DL models can really generalise, or whether they just interpolate between high dimensional data points. To my eyes there’s convincing evidence on both sides of that debate. I’ve not tested anything on the reasoning side, but this is where intuitively you’d hope that when bridging the soft and hard reasoning between language and classes/sets/FOPL might have some success; because LLMs know quite a lot about FOPL, so carry some intuitions on interpreting that. I’d love to see some quantitative experiments in this area.
I intend to run some. The problem is it's non-trivial since a lot of the benchmarks of set up assuming text/text retrieval as opposed to data/text. The SQL generation benchmarks are probably the most applicable, though those require a whole RDF/SQL adapter layer. Which I happen to be working on, but won't have the ability to run good benchmarks until that's in a usable state.
I was wondering if RAGAS could be used as an end to end measure — and whether you could test some of the assumptions you raise in the talk, without having to rely on complex SPARQL generation…. for example wrapping some tools over DESCRIBE, and potentially DESCRIBE filtered to a subset of predicates recursively applied down a transitive/property path… e.g. :hasParent.
The i/o of LLMs, or the transformer architecture, is typically always text. Google has been doing interesting work with multi-modal systems, but this is a non-trivial adaptation. Sure, we can serialize a graph into TTL or JSON-LD, and we can ask an LLM to produce data in those formats, but it's inherently lossy, since it's flattening the graph concept into a linear syntax, and every transformation of information in this way is an extra step for an LLM to get wrong (and they get so many wrong). I'm starting to look at the various graph embeddings, as I'm curious if it's possible to have a lossless representation and use that as a mode for a multimodal transformer.
@rickmoynihan One thing I've been working with is a set of functions that query OWL to get a full class description (a non-trivial thing in SNOMED-CT) along with it's parent tree, to create a subgraph that accompanies LLM queries.
I also make that function available to the LLM to call, though the LLM can be inconsistent about calling it, which is why I've been proactive in attaching the subgraph that I know is relevant
yeah I was thinking augmenting the context with relevant classes/properties (ontology snippets) to the query would be important. Basically bridging entities from the query into RDF concepts. I was thinking a good evaluation for this sort of thing might be generating stories where we’ve generated family trees (which are represented in the graph) and asking it questions about the story in terms of relationships that may/may not be explicitly stated in the text. Obviously all of that can be generated synthetically too.
If you want to get together on a call about this, I would be very interested. I'm picking a PhD topic, and it's in this space. The more I can talk with people about it, the more informed I'll be
I’d certainly be interested in doing something like that, though timezones, kids and christmas may be a challenge for me 😂
Christmas actually makes it easier for me, since I'm finishing up work on December 19, and my time gets a little more flexible for the following 2 weeks 🙂
I'm busy for the next few days, since postgraduate study also requires subjects, and I have exams on Monday
Convincingly? No. But there are some interesting possibilities. LLMs (well, the main ones at least) all know OWL. This lets them work quite effectively with RDF data. An OWL description gives an LLM enough info to https://arxiv.org/pdf/2311.07509https://arxiv.org/pdf/2311.07509, more effectively than they can use DDL to write SQL. They’re also good at reading RDF (I typically provide Turtle) and OWL written in TTL, so I can provide subgraphs along with queries to provide context. The part that intrigues me the most is that because they can understand OWL, they can do OWL reasoning (of course, I wouldn’t expect 100% accuracy from an LLM). However, because they’re not using a reasoning algorithm, they can handle operations that go beyond and of the established profiles, even veering into OWL Full on occasion.
I’ve had some interesting successes by embedding all the labels of a graph and storing them in a vector database along with the resource’s IRI. Then I could embed natural language queries, look them up in the vector DB to get the IRI, and use this as a starting point in the graph to select out a subgraph and provide this as context along with the original text for an LLM. Then, the LLM can ask for clarifications by issuing SPARQL queries as it formulates its answer.
As Luke said, LLM seem to have a graph-like internal representation of information, so providing information in a graph structure appears to be very effective.
I’ve also found that they’re quite good at generating a graph representation of the information that they’re trying to convey. (e.g. asking it to extract an RDF representation of data in provided text)
Starting a new thread about TypeDB, the klezmer archive project, defensible reasoning, and any other miscellany coming out of https://clojurians.slack.com/archives/C09GHBXRC/p1732883842021819
Not hugely, though we had some contrived examples that would for example reason about a persons availability and communication preferences; or reason about stock trading investment decisions, also there was some stuff looking at health care and clinical trials. One of the ideas there was that of agentitive representation, software agents representing people and their interests in the digital world, like a digital twin; and agents would mediate communication/interactions between people through interacting with other agents. This was at the dawn of web services, mobile and social networks. One of our demos was essentially facebook, but with a mobile communications focus; and there were group phone chats / communities and things orchestrated via agents and 3rd party call control… so basically every person in the group (if they were still available/present) would be called by the network and a group chat established. We had an agent communication language (ACL) where all communication was grounded logically in terms of “mutual belief”… i.e. one of the purposes of communication was to establish a “belief in mutual belief”. Hence the defeasibility, as you wouldn’t want to credulously believe an other agent; but would want to reason over your perception of their beliefs; and effectively keep them partitioned from your own so you could control consistency. Also it came with all the well known problems of logic based approaches. It was loads of fun, and perhaps ahead of its time, given the recent focus on agents — but the gulf between practical application and solving real problems efficiently was huge; and there were lots of unsolved problems beneath the surface. It was way too elaborate and early to be required as industry was only just starting to get its head around the internet; but I still have a fondness for the ideas and part of me hopes they’ll resurface somewhere one day with a more healthy dose of pragmatism.
Sorry link to the full paper is here: https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=163c90ded06c11bb92a9028780009ed50fb8f3cf
I always thought there were more potential applications for this in computational law… but I think that’s a tough sell.
That’s so cool, what an incredible experience
@rickmoynihan I’m curious to learn more about this defeasible reasoning/non monotonic logic thing, and/or if I’m accidentally getting myself into trouble with the conflicting data/ranking conclusions approach I described. Thoughts?
or if I’m accidentally getting myself into trouble with the conflicting data/ranking conclusions approach I described. Yes I think there’s a risk of that. Though I can’t speak to the details. > The contradictory facts are in ground data not reasoner rules If you have two contradictory facts you can conclude anything. https://en.wikipedia.org/wiki/Principle_of_explosion
See also “Truth Maintainance Systems” I’ve been told TMSs were quite popular in the Lisp GOFAI community back in the day.
I can’t speak to typedb
TypeDB isn’t an issue since we’re not currently planning to use it 😆 Our domain is history and culture, which is not exactly an axiomatic system. Am I understanding correctly that it’s possible to avoid explosion in practice by limiting the types of inference you do, rules you add, etc? Like in their example, you only conclude that unicorns do and do not exist because of the disjunctive rule, in practice someone would have to add that rule. And in my particular use case, if someone did add that rule, concluding both the fact and its contradiction is our desired result, hence the need for ranking
I’m familiar with truth maintenance, though I think there’s a fairly wide range of what that entails (hah) in the specifics (from validating axiomatic systems to just pruning old cached inferences)
> Our domain is history and culture, which is not exactly an axiomatic system. Indeed — which is why in the other thread I said: > I’m assuming the contradictions weren’t expressed as logical inconsistencies in that logic system? i.e. if you don’t model the contradictions in the logic you don’t have to handle them. > Am I understanding correctly that it’s possible to avoid explosion in practice by limiting the types of inference you do, rules you add, etc? Yes, see for example the wikipedia page on https://en.wikipedia.org/wiki/Paraconsistent_logic that mentions a few options. You can always write arbitrary logic programs in datalog/prolog/minikanren etc, and maybe that is sufficient. The difference between that and a formal logic isn’t always clear to me, but if you’re to have clear semantics (and avoid tying yourself in knots) you probably want to know where you stand with regards soundness/completeness etc. If for example you’re writing it in datalog you’ll have a termination guarantee, and therefore your logic will necessarily be constrained by that (but it may be a good constraint to have). You’re pushing my limits already, so I’d take what I’m saying with a grain of salt, and I am far from an expert, but I’d be tempted to familiarise yourself with something that has an existing formalism, and ideally implementation. If only so you can explain why the system has certain properties/behaviours; and you’ll have some guard rails on doing anything that may have negative consequences.
Thanks, that’s helpful, I’ll read up on this stuff. What was the system you worked on, and how did reasoning play into it?
We were a university spinout, building a multi-agent systems framework. The defeasible reasoning engine was an implementation of the https://link.springer.com/chapter/10.1007/978-94-017-0456-4_3, and we later added defeasible priorities too, our CTO did most of the reasoning implementation — but it was fun to be the annoying kid in the room whilst the professors fill whiteboards with dense logic 😂 Unfortunately the work was all proprietary and died with the company 😞 though it was hard to see how it would have been more fully commercialised.
That sounds really interesting. You don’t happen to have access to the rest of the chapter, do you?
What kinds of applications were they aiming for?
We had a contract with a mobile network operator research lab, to apply it to mobile games and telecom services (call control / presence). The demos were quite cool for their time, but never really required the technology. I think the customers just liked that our demo’s would make their phones ring 😂 We pivoted to other things, but kept the reasoning thing going too long, probably because it was fun 😂
It sounds fun! How’d reasoning figure into that?