datomic

rutledgepaulv 2025-03-26T12:11:32.722739Z

Hi, maybe you all can help me work through a question of balancing performance and abstraction in datalog rules based on which lvars are bound (inputs) or not bound (outputs)? 🧵

rutledgepaulv 2025-04-01T23:37:04.577349Z

Just one more follow up - have/would you consider slightly expanding the syntax for required binding guards to support routing between impls? Imagining something like this:

[[(a-relates-to-b [?a] ?b)
  [?e :person/favorite-color ?a]
  [?e :person/best-friend ?b]]]
[[(a-relates-to-b ?a [?b])
  [?e :person/best-friend ?b]
  [?e :person/favorite-color ?a]]]
I think that could be backwards compatible syntax and it operates in the engine at a level where it's aware which lvars have bindings?

favila 2025-04-01T23:49:19.576539Z

it changes the semantics. Today the required-binding is an assertion, and if it can't run it that way it blows up. This would turn it into a filter, introducing polymorphism on a state which doesn't belong to any datalog formalism

favila 2025-04-01T23:50:10.948819Z

the backwards incompatible part: the first a-relates-to-b today always tries to run, would blow up. with your proposal it would silently not run

rutledgepaulv 2025-03-26T12:12:05.725489Z

• By default lvars in the head of a datalog rule can be both "input" or "output" depending on the bindings you supply. I'm aware you can also mark leading lvars as "must be bound" by wrapping them in a vector. • Datalog rules, like queries, encode an ordering of clauses. That order determines how the indices are accessed and the relations that need to be joined when the rule is processed. • The "optimal" ordering of clauses depends on which lvars were bound when the rule is used. As an example, consider this contrived rule that relates people who like a particular color to their best friend.

[[(a-relates-to-b ?a ?b)
  [?e :person/favorite-color ?a]
  [?e :person/best-friend ?b]]]
Clearly this rule is efficient when you already have a bound value for ?a. However, what if you only know ?b and you want to know ?a? Sure, the original implementation will still work (eventually), but the efficient ordering when you only know ?b is actually the reverse:
[[(a-relates-to-b ?a ?b)
  [?e :person/best-friend ?b]
  [?e :person/favorite-color ?a]]]
I know I can deal with this simply by creating two different rules and using the right ones in the right cases. However, I'd really like to consider datalog rules as a tool for encoding bidirectional relationships between things that just "does the most efficient thing" depending on what is known. Is there a way I can check within my rules whether a lvar is bound and use that to "route" to an efficient ordering of clauses? Am I missing anything? In the general case this could require up to 2^N different implementations where N is the number of lvars in the head of your rule.

rutledgepaulv 2025-03-26T12:18:38.234349Z

Said another way, I feel like a tension exists between rules promising bidirectional abstraction / indirection and yet having a singular concrete ordering that implicitly favors some usage patterns over others and I'm wondering if there's a way to have my cake and eat it too.

favila 2025-03-26T13:09:14.451819Z

There is nothing you are missing. You have diagnosed the situation accurately

favila 2025-03-26T13:11:19.048709Z

The syntax to require binding exists as a guardrail against falling off performance cliffs caused by using naively using rules exactly the way you are desiring to use them

favila 2025-03-26T13:11:56.914099Z

Absent an optimizing query planner, you just have to write a specialized rule for each pattern of bound inputs

1
🤔 1
rutledgepaulv 2025-03-26T13:55:32.468429Z

And there's no way to check if a lvar is bound or not? If there was I could do things like this:

[[(a-relates-to-b ?a ?b)
  [(bound? ?a)]
  [?e :person/favorite-color ?a]
  [?e :person/best-friend ?b]]
 [(a-relates-to-b ?a ?b)
  [(bound? ?b)]
  [?e :person/best-friend ?b]
  [?e :person/favorite-color ?a]]]

favila 2025-03-26T13:56:22.484929Z

A way to check if it's bound would be "outside" datalog formalisms, digging in to the runtime state of the query engine.

rutledgepaulv 2025-03-26T13:56:40.278319Z

That's what I figured just double checking. Thanks!

favila 2025-03-26T13:57:21.077459Z

clauses never "see" an unbound thing; the engine is always binding everything for you

favila 2025-03-26T13:57:29.941719Z

that's the point 🙂

favila 2025-03-26T13:58:02.265459Z

the mechanism you describe is something like: here's a state, return a new AST for me

favila 2025-03-26T13:58:15.953109Z

changing the query at runtime