rdf 2021-02-15 | Slack Archive

rickmoynihan09:02:53

@simongray Naga has a data syntax for rules; it’s not the datomic syntax but instead looks to be a direct translation of the datalog horn clause syntax naga supports: https://github.com/threatgrid/naga/blob/964a71e300c947ead28f1b344ed9bccdbf97ff9b/test/naga/test_rules.cljc#L21 I’m not familiar with datomic rules, and have only played with naga briefly but after looking at them again it looks like the main differences are: 1. (I think) naga rules must be either unary or binary predicates. 2. Naga rules are materialised once at startup; datomic’s are AFAICT materialised at query time. I’m guessing datomics rules through a process similar to macro-expansion just expand into datomic query fragments… presumably with gensyms etc to prevent unintended bindings.

simongray09:02:52

Ooh, that Clojure data Prolog style syntax looks interesting.

simongray09:02:38

I was asking about Zach Oakes' O'Doyle Rules, though, which uses Datomic-style triples/facts, but otherwise doesn't have anything to do with Datomic.

rickmoynihan10:02:02

ah sorry — I don’t know… IIRC clara is based on the RETE algorithm… so you might learn more by looking at the differences between rete and datalog. Not sure about O’Doyle, I only heard of it the other day… Looking at the README it implies it’s possible to create infinite loops in it though; which means it may be turing complete. Datalog guarantees termination (and consequently so should Naga).

quoll20:02:46

Datalog can be implemented via RETE

rickmoynihan14:02:46

I wouldn’t doubt that for a second given that RETE is more powerful than datalog; but consequently misses datalogs safety properties (re termination guarantees etc)

quoll16:02:30

Definitely. It’s essentially a rule scheduler. Datalog imposes policies that ensure boundary conditions on that scheduling

quoll17:02:18

Actually, that was an interesting thing for me to learn about. I implemented RETE, and I needed to configure what each rule did, and what the scheduling dependencies between the rules were. At that point, I was describing all of that using an RDF graph. But the rule descriptions were in Horn clauses (because that’s how everyone wrote them at the time, and I was used to writing them that way too). I was thinking that instead of writing Horn clauses on paper, and then translating from Horn to the graph representation, and entering all the graph data, then I should write a parser that could read Horn clauses, and convert them into the graph representation automatically. The only problem was figuring out the connections between them. But then I had my epiphany… I knew how to automatically find those dependencies! (I can even tell you which street in Chicago that I was walking along when I realized this).

quoll17:02:46

So I implemented that, and it worked really well.

quoll17:02:18

At this point, I realized that due to some of the things I’d done in the implementation of RETE, that I had a fully compliant Datalog engine

quoll17:02:34

It was a total surprise 🙂

quoll17:02:19

So I put in the effort to learn more about Datalog

quoll17:02:31

And my implementation experience informed me a lot

rickmoynihan17:02:53

Sorry, not following what the epiphany was?

quoll17:02:16

That I could identify the dependencies between rules automatically.

quoll17:02:28

Those dependencies are an important part of configuring RETE

rickmoynihan17:02:29

ok by querying for them?

quoll17:02:32

No. The form of horn clause is:

pred_y(v1,v2) :- pred_x(v3,v4), ...

If the head of any clause matches any expression in the body of any clause, then this forms a dependency of the latter on the former

rickmoynihan17:02:09

yup

rickmoynihan17:02:23

(I did a bit of prolog many moons ago)

quoll17:02:19

“Matching” means that one for each of the 3 values in pred(v1,v2) then either they contain identical concrete values (atoms, numbers, etc), or when either is variable

quoll17:02:43

Note that pred is included in that match, which means that 2nd order expressions are also allowed

quoll17:02:30

and the simple translation of pred(v1,v2) to [v1 pred v2] simplifies this significantly

quoll17:02:19

Also, pred(v) => [v :type pred]

rickmoynihan17:02:40

yes I saw that in naga

quoll17:02:22

and if you really want it… pred(v1,v2,v3...) => [v1 pred ?gen2] [?gen2 :tg/first v2] [?gen2 :tg/rest ?gen3] [?gen3 :tg/first v3] …

rickmoynihan17:02:50

rdf lists 😬

rickmoynihan17:02:55

but yeah makes sense

quoll17:02:17

slightly different… I don’t terminate them with nil. I found it was just not needed

quoll17:02:35

(this is internally, where I can presume a closed world)

rickmoynihan17:02:31

Is that the epiphany, that n-ary predicate matching is simplified by reducing everything to triples?

quoll17:02:46

quoll17:02:07

The epiphany was just that I could find the dependencies automatically

rickmoynihan17:02:26

oh ok 🙂

quoll17:02:40

Because everything was unary and binary predicates (expressed in triples), then it was easy to see this

rickmoynihan17:02:49

yeah

quoll17:02:26

when it was n-ary predicates, then it wasn’t clear. But since I was working with the triples architecture it became very obvious

rickmoynihan17:02:31

I guess that’s kind of what I meant by querying for them

quoll17:02:34

weeeeeeell… yes, but… this was while parsing and creating an internal representation. It wasn’t yet in a graph, and therefore wasn’t queryable

quoll17:02:12

Naga is my 3rd implementation of this (my first was in Java, and my second was in Clojure).

rickmoynihan17:02:55

ok yeah — I guess you’ll want to unify all the clauses to the same identity when they match.

quoll17:02:08

The dependency identification in Naga is done here: https://github.com/threatgrid/naga/blob/main/src/naga/engine.cljc#L149

quoll17:02:40

Recursive work through the body is done in the function https://github.com/threatgrid/naga/blob/main/src/naga/rules.cljc#L174

quoll17:02:22

Back when it was simply: pred1(v1,v2), pred2(v3,v4)… then it was much easier

rickmoynihan17:02:18

thanks for the explanation 🙇

quoll17:02:36

but now the body can be: [?a :attr ?b] [?b :attr2 "x"] (not (or [?b :attr3 "y"] [?a :attr3 "z"]))

quoll17:02:12

Also, the head can contain multiple statements as well. So that’s an extra complexity 🙂

quoll17:02:25

But yeah… you’ll see that the code I just gave you can process rules that have bodies with expressions like this. No “graph” is created to describe the rule, just standard Clojure structures.

quoll17:02:55

It has shortcomings, but in general I’m a little proud of it 😊

rickmoynihan17:02:34

ah ok so the rules aren’t actually stored in the graph

rickmoynihan17:02:04

you just use the representation to materialise the RETE network

quoll17:02:06

correct

quoll17:02:25

Though, that very first implementation in Java did so.

quoll17:02:05

But it did the whole analysis of the dependencies first, and only when it had all of the information, then it inserted it all into the graph in one step

quoll17:02:32

Strictly speaking, it could have done it in 2 steps… insert the basic rules first, then use queries to extract what was needed to determine dependencies, and then insert those dependencies. But it would actually put more work on the system to do that

👍 3

rickmoynihan17:02:45

I’d imagine it’d be trivial to implement a subset of owl/rdfs in naga

quoll17:02:08

Here you go. This is skos, using a subset of OWL

quoll17:02:33

RDFS inferencing was actually my test case for the first implementation (in Java)

rickmoynihan17:02:51

fantastic

quoll17:02:14

Most of it is declarative. Lines 126 onwards are the most interesting

quoll17:02:40

Like I said, I was very proud of this 🙂

rickmoynihan17:02:57

ah I forgot you can use predicates as variables

quoll17:02:24

That’s what I was referring to when I mentioned “2nd order expressions”

rickmoynihan17:02:39

yeah that’s handy here

rickmoynihan17:02:54

I thought those horn clauses looked a little strange at first!

quoll17:02:07

😄

quoll17:02:14

They definitely do!

quoll17:02:25

That’s really tough to pull off in Prolog

rickmoynihan17:02:29

does this perform well?

quoll17:02:45

(though, I’m still learning how to implement Prolog)

rickmoynihan17:02:02

cut is a mind melter

quoll17:02:10

Oh yes. It performs VERY well. After all, it’s well bounded and it runs in RETE

quoll17:02:04

My first attempt at building the rules structures were for Mulgara (that skos code was actually from the Mulgara project). The directory containing it all is: https://github.com/quoll/mulgara/tree/master/src/jar/krule/java/org/mulgara/krule

rickmoynihan18:02:48

ah thanks for sharing

rickmoynihan18:02:54

what is the license for those rules?

quoll18:02:14

It was part of Mulgara, which is OSL

rickmoynihan18:02:14

ok OSL

quoll18:02:27

But it’s copyright to me. I should explicitly CC them

quoll18:02:20

I want to point out something else, since I’m here. Naga include adapters to different databases, has a priority queue implementation, and various things in other files. But if you keep it to JUST the structure definitions used for rules, and the code that executes the engine, then:

$ wc -l rules.cljc engine.cljc schema/structs.cljc 
     215 rules.cljc
     176 engine.cljc
      76 schema/structs.cljc
     467 total

quoll18:02:25

Meanwhile, the code that defines the structures and executes the engine (with far fewer features) in Java is:

$ wc -l *.java
     139 ConsistencyCheck.java
    1079 KruleLoader.java
      52 KruleStructureException.java
     243 QueryStruct.java
     389 Rule.java
     360 RuleStructure.java
    2262 total

quoll18:02:09

I had to trace through the Mulgara code to figure out out rules get executed, how they’re loaded, etc, and it took me over 10 minutes. (it’s in a whole lot of other directories). But that’s all glue to incorporate it into the database system. The meat of it is in that single directory. But the whole thing is embarrassingly complex!

👍 3

quoll18:02:50

That’s my little love letter to Clojure ❤️

rickmoynihan10:02:59

AFAIK you can do the same with RETE (create infinite recursions)

quoll15:02:23

Ah, I just addressed some of this in the thread above (sorry... I’m slow on my phone)

quoll15:02:32

Naga was Datalog, with the restriction of binary/unary predicates. (Higher arity actually isn’t hard to do, but I never needed it :woman-shrugging:)

quoll15:02:00

But it’s been extended in ways that are described in the thread

quoll15:02:29

Full query syntax is allowed in the rules too.

quoll15:02:44

Including not, or, filters, bindings, etc. The head can also have multiple triples to be asserted for a single match (allowing multiple attributes for a single generated entity)

quoll15:02:20

Also, Naga rules are materialized as needed. If you’re using a macro, then yes, that has to be at startup in ClojureScript (because ClojureScript macros are evaluated as Clojure code during compilation), but not in Clojure. But the r macro is just a convenience around the rule function, which takes: • head: a seq of triple patterns • body: a query :where clause expression • name: optional This function can be called at runtime, even in Cljs.

quoll15:02:03

We do it at Cisco. Build rules on the fly, then execute them against a connection to Asami

simongray09:02:43

Just wanted to say thank you for this massive explanation 🙂

2021-02-15

Channels