in datomic, and i imagine other graph databases, it's possible to expression a set join from one id to a collection of ids. e.g below in the query user-id is a single value but the runners value ?user-id (to me) represents a binding to a collection of ?user-id that essential equates to (filter runners (fn [runner-id] (= runner-id user-d))
Or in the :where clause of datomic it might look like:
[user-id :name name]
[_ :runners ?user-id]
For another example of a join, see the datomic docs and look for :https://docs.datomic.com/pro/query/query.html
I'm fairly sure looking at the clara docs, that i can do a join from one id to one id. e.g in datomic where clause:
[?user-id :dog ?dog-id]
[?dog-id :name ?name]
But i'm not sure i can join across a collection and have it filter so that it returns (matches) only the one. Now, it could match all of them and then i could right the filter function, but i'm wondering why the rules engine wouldn't have semantics for expressing that, would it just to make sure the user understands its not a direct lookup?I’m not sure I understand the question, but will try to hit on the points I think that are relevant: 1) If you have a rule that does something like:
[User (= ?user-id user-id)]
[Dog (= ?user-id owner-user-id)]
You will get RHS activation for all pairs of facts that satisfying this. Essentially this is a “cartesian product” of matches. This should be the same conceptually as a WHERE in relational query langs (or similar).2) If you want to instead aggregate “one side” of this join to perhaps get a single RHS activation to work with, you can use an accumulator, eg:
[User (= ?user-id user-id)]
[?dogs <- (acc/all) :from [Dog (= ?user-id owner-user-id)]]
Now you’ll have ?user-id and ?dogs available on the RHS (unless you are writing a query and then it can be bound to to pull from the session).
This will give you a RHS activation for each user-id to their collection of dogs.
If a user-id has no dog, it’ll also get a match with an empty coll. You can stop that with something like [:test (seq ?dogs)] if you need.I’m not sure this addresses your question. It seemed to me like perhaps what you were wanting to see was an “accumulator” concept though.
I was asking about how to achieve what equates to a where clause join. Put another, i was asking about how to find the intersection between two sets. (in my example users and dogs) I would also be curious if there were docs or examples on how to do all the joins: left hand join, right hand join, inner join, outer join, etc... Will the joins be done at runtime/on-demand, or does the rules engine allow you to build a materialized view/pre-computed-joined-table. Which would allow for faster access at the cost of more space being held on to.
One of the nice parts of sql is the very terse, will understood semantics for gathering data, so ideally i would like to leverage that when ever possible. As an aside, i wish there was some resource that talked about the key differences between graph query systems and rules engines, they seem very closely related with some fundamental trade off between them. I feel like if i understood that relationship better, it would make it easier for me to communicate what should go where and why 🙂
When you match 2 rule conditions on a constraint - it’s like an inner join
There aren’t a bunch of relational join constructs present. You use accumulators to do things like that.
The accumulator I gave you before, is was basically a “left outer join” if the “left” is the User and the “right” is the dog. However, you can have more than 2 conditions in a rule so there isn’t really a “left” and “right” in general.
The rules do essentially “cache” matches and tradeoff space usage for runtime performance.
When new facts are added or the truth maintenance system (aka TMS) finds invalidated logic, the rules working memory is “updated efficiently” trying to minimize runtime processing utilized stored internal state.
This really is just all about the “rete algorithm” design of the rules.
There are a ton of docs on this subject in general - since it has been around a long time. There are a lot of modern adaptations to the original “rete systems”, but the fundamentals remain. I gave a talk and then turned it into a blog post as a high-level overview utilizing clara-rules for example here: https://www.metasimple.org/2017/02/28/clarifying-rules-engines.html
There is another mention of this “rete algo” inspiration with an older paper on the subject linked here https://github.com/oracle-samples/clara-rules/wiki/Introduction#the-rules-engine
thanks for the help mike. How were you able to read through the Doorenbos paper? Like, i recall trying to pick that up once, i got like 3 pages in, life interrupted, and 2 days later i had to start all over again. Is it one of those things that you have to dedicate a month to? Was their some incremental way to check your progress?
It is an old paper. I think the core sections on the algorithm are the only thing I find relevant from a modern perspective. So it ends up only being a few chapters.
It doesn’t discuss modern rete features like TMS or batched fact/activations propagation or accumulators either. But the foundations are good and help think about those things later.