I'm trying to better understand the query-stats map and I'm trying to wrap my head around a few things:
• does the :sched key on a phase contain information not also present in the :clauses? it seems like in some cases it has some additional grouping information, but I'm not really sure how to read it. An example: :sched (((parent? ?x ?y)) ((parent? ?x ?a) (ancestor? ?a ?y))) . The parens establish two groupings here, one with (parent ?x ?y) and one with (parent? ?x ?a) (ancestor? ?a ?y) . What do those groupings mean?
• how should I think about the meaning of a phase? it seems to be basically an extent for variable bindings. Any time that bindings are discarded (like the end of an or-join/rule body), the query goes into a new phase. Is that right?
Clauses are given sequentially but with recursive rules they are a flattened representation of a non-sequential operation. One aspect of this is that rows are propagated from one phase to another, which looks like duplicates.
Interesting. Part of what I'm doing in this visualization is trying to reconstruct the tree of rule invocations from the sequential list of phases. It sounds like I should eliminate all but the first repetition of a sequence of phases, for this purpose.
Thanks for the explanation, Francis--this was helpful. I'm further trying to understand some behavior I'm seeing specifically around query-stats involving recursive rules. With recursive rules (and only with recursive rules, AFAICT), I seem to be seeing doubling of many phases of the query (i.e. the same phases appear twice in the list, with identical row counts). A contrived minimal example:
(ns test-case
(:require [datomic.api :as d]))
(def uri "datomic:")
(d/create-database uri)
(def conn (d/connect uri))
@(d/transact
conn
[{:db/ident :node/name
:db/valueType :db.type/string
:db/cardinality :db.cardinality/one}
{:db/ident :node/child
:db/valueType :db.type/ref
:db/cardinality :db.cardinality/one}])
@(d/transact
conn
[{:node/name "root"
:node/child {:node/name "a"
:node/child {:node/name "b"
:node/child {:node/name "c"}}}}])
(def db (d/db conn))
(def query-stats (:query-stats
(d/query {:query '[:find ?x ?y
:in $ % ?ancestor-name
:where
[?x :node/name ?ancestor-name]
(ancestor ?x ?y)]
:args [db
'[[(ancestor ?x ?y)
[?x :node/child ?y]]
[(ancestor ?x ?y)
[?x :node/child ?z]
(ancestor ?z ?y)]]
"root"]
:query-stats true})))
test-case> (= (take 5 (:phases query-stats))
(drop 5 (:phases query-stats)))
true
Should I interpret this as an indication that Datomic is actually doing the same work twice (unnecessarily)? and if so, could I be doing something differently to avoid that? Or is this just an artifact of the way query-stats is constructed?
And following on that question: the context is I'm working on visual rendering of query-stats to make it a little easier to comprehend. Do you think it makes sense to try to collapse this duplication in such a visualization?That visualization sounds cool. I'm not sure if deduplication based on order could get directionality of the propagation wrong, though. In case it helps to compare, this has a query-stats ns with my exploration of a slightly less minimal recursive rule: https://github.com/daveliepmann/examples-from-what-you-always-wanted-to-know-about-datalog/
Each :sched entry is all the clauses for one "rule" in the datalog sense. Top-level :in + :where + :find is regarded internally as a rule, and or-join/not-join also expand to rules. :sched shows you all the expansion.
A :phase is the execution of one rule, i.e. one group in :sched.
"basically an extent for variable bindings" -- not exactly: a rule may not need every binding that is available, so phases may seem to shed variables. But the rule that's being executed is what drives the phases, not the variable-binding-extents.
I see, thank you