2023-03-15 rdf | Clojure Slack Archive

rdf

Kelvin 2023-03-15T10:38:17.766189Z

The idea of connecting LLMs to knowledge bases does seem like a good PhD research topic 🤔

2023-03-15T10:51:52.414759Z

Indeed… Though I suspect it’d be “relatively straightforward”* to train or fine-tune a “supervisor NN” to check a knowledge base as part of the reward function as a mesa-optimiser; and then use that to train the base model or do something like an adversarial learning approach. I think the bigger problem is likely the quality and coverage of existing knowledge bases. Or approaches where you train it how to fact check its output… I think bing’s GPT had an ability to take actions like search the web; so I think you could reasonably teach it how to fact check an oracle; and then use RLHF to check it fact checked the right things, and in the right context and correctly interpreted the fact checking results from its prompt to the fact-checker. * obviously by ‘relatively straightforward’ I mean straightforward for experts and not me (a total layman) to do it.

respatialized 2023-03-15T13:10:31.773199Z

I'm very confident that an interesting approach will be developed with the release of GPT-4, which has a https://cdn.openai.com/papers/gpt-4.pdf with lots of detail on (checks notes): absolutely nothing about how it fits together: > "Given both the competitive landscape and the safety implications of large-scale models like GPT-4, this report contains no further details about the architecture (including model size), hardware, training compute, dataset construction, training method, or similar."

curtosis 2023-03-15T14:53:50.045779Z

That right there is “cannot be safely used for any purpose other than (arguably) research”.

⬆️ 1

curtosis 2023-03-15T15:04:14.837799Z

coming back to the channel topic though… 🙂 @quoll I was reminded again of your kiara adapter for datomic … can you say a bit about how it informed your later approaches? or put another way — if you needed something like that today would you still use the same broad approach?

quoll 2023-03-17T14:48:37.925409Z

From an efficiency POV, then connecting objects in a graph makes sense. Datomic does this too, insofar as those objects are “entities”. I like the homogeneity of RDF though. Maybe people prefer having properties on objects to be separate from the edge labels, but I like the way RDF allows greater flexibility here. We see it with things like SKOS notations, where the value can be structured or a literal.

👍 1

curtosis 2023-03-17T15:11:10.350449Z

Agreed. And without RDF you don’t have standardized, defined semantics for graph merge, and you have to do the integration yourself ~every time.

quoll 2023-03-15T15:23:28.043089Z

I would probably work more with IRIs (URIs on the JVM) and not try to use keywords as much as I did. That’s for scalability reasons.

quoll 2023-03-15T15:25:38.510199Z

I would also be more willing to create entities for each IRI, and not try to eliminate those as much as I did. Partly because I don’t think it would hurt scalability as much as I was worried about, and also because I needed too many exceptions to check for raw IRIs (permissible in the “value” or object position, but not in the “entity” or subject position)

quoll 2023-03-15T15:27:15.304409Z

I would also probably work more with RDFS than I did. I work more with models now, while in the past I was happier with raw RDF. These days I’m thinking that it’s not such a terrible thing to request that a schema be available (one can always be derived, of course)

quoll 2023-03-15T15:27:42.144339Z

It’s been a few years since I looked at it though! I’m operating on memory that is like… a decade old at this point

curtosis 2023-03-15T15:28:05.452089Z

LOL … all very helpful still!

curtosis 2023-03-15T15:34:44.162829Z

then again, my current top challenge is “graph == neo4j” so I might just be tilting at windmills. not the first time I’d be described that way, to be fair.

Clojurians Log v2

rdf