Hi, maybe you all can help me work through a question of balancing performance and abstraction in datalog rules based on which lvars are bound (inputs) or not bound (outputs)? 🧵
Just one more follow up - have/would you consider slightly expanding the syntax for required binding guards to support routing between impls? Imagining something like this:
[[(a-relates-to-b [?a] ?b)
[?e :person/favorite-color ?a]
[?e :person/best-friend ?b]]]
[[(a-relates-to-b ?a [?b])
[?e :person/best-friend ?b]
[?e :person/favorite-color ?a]]]
I think that could be backwards compatible syntax and it operates in the engine at a level where it's aware which lvars have bindings?it changes the semantics. Today the required-binding is an assertion, and if it can't run it that way it blows up. This would turn it into a filter, introducing polymorphism on a state which doesn't belong to any datalog formalism
the backwards incompatible part: the first a-relates-to-b today always tries to run, would blow up. with your proposal it would silently not run
• By default lvars in the head of a datalog rule can be both "input" or "output" depending on the bindings you supply. I'm aware you can also mark leading lvars as "must be bound" by wrapping them in a vector. • Datalog rules, like queries, encode an ordering of clauses. That order determines how the indices are accessed and the relations that need to be joined when the rule is processed. • The "optimal" ordering of clauses depends on which lvars were bound when the rule is used. As an example, consider this contrived rule that relates people who like a particular color to their best friend.
[[(a-relates-to-b ?a ?b)
[?e :person/favorite-color ?a]
[?e :person/best-friend ?b]]]
Clearly this rule is efficient when you already have a bound value for ?a. However, what if you only know ?b and you want to know ?a? Sure, the original implementation will still work (eventually), but the efficient ordering when you only know ?b is actually the reverse:
[[(a-relates-to-b ?a ?b)
[?e :person/best-friend ?b]
[?e :person/favorite-color ?a]]]
I know I can deal with this simply by creating two different rules and using the right ones in the right cases. However, I'd really like to consider datalog rules as a tool for encoding bidirectional relationships between things that just "does the most efficient thing" depending on what is known. Is there a way I can check within my rules whether a lvar is bound and use that to "route" to an efficient ordering of clauses? Am I missing anything? In the general case this could require up to 2^N different implementations where N is the number of lvars in the head of your rule.Said another way, I feel like a tension exists between rules promising bidirectional abstraction / indirection and yet having a singular concrete ordering that implicitly favors some usage patterns over others and I'm wondering if there's a way to have my cake and eat it too.
There is nothing you are missing. You have diagnosed the situation accurately
The syntax to require binding exists as a guardrail against falling off performance cliffs caused by using naively using rules exactly the way you are desiring to use them
Absent an optimizing query planner, you just have to write a specialized rule for each pattern of bound inputs
And there's no way to check if a lvar is bound or not? If there was I could do things like this:
[[(a-relates-to-b ?a ?b)
[(bound? ?a)]
[?e :person/favorite-color ?a]
[?e :person/best-friend ?b]]
[(a-relates-to-b ?a ?b)
[(bound? ?b)]
[?e :person/best-friend ?b]
[?e :person/favorite-color ?a]]]A way to check if it's bound would be "outside" datalog formalisms, digging in to the runtime state of the query engine.
That's what I figured just double checking. Thanks!
clauses never "see" an unbound thing; the engine is always binding everything for you
that's the point 🙂
the mechanism you describe is something like: here's a state, return a new AST for me
changing the query at runtime