I’m trying to get some more insights into the query planner. I’ve read the docs on it (which are very informative). So this query runs out of heap space.
(d/q '[:find (count ?acc).
:where
(was-part-of-the-registered-drop ?acc)
[?tr :transfer/source ?acc]
(not-a-jup-fee ?tr)
(not-an-lp-interaction ?tr)])
However, if I pull some of the query into a rule
(def has-sold
'[(has-sold ?acc)
[?tr :transfer/source ?acc]
(not-a-jup-fee ?tr)
(not-an-lp-interaction ?tr)])
And then call it.
(d/q '[:find (count ?acc).
:where
(was-part-of-the-registered-drop ?acc)
(has-sold ?acc)])
The query runs in 758.25926 msecs. ⚡
Are rules handled differently? Or is it just because they limit the scope of unification? Also is there a way to limit the scope of unification without a rule? eg:
(d/q '[:find (count ?acc).
:where
(was-part-of-the-registered-drop ?acc)
(let [?acc] ;; <- something like this?
[?tr :transfer/source ?acc]
(not-a-jup-fee ?tr)
(not-an-lp-interaction ?tr)]))
Really impressed with how fast datalevin and I’m not using a small dataset (1183349 entities). The range queries are also blazing fast!Rules are not handled by query planner
Rule engine rewrite is the next major work
Ahh so is my query fast because I’m bypassing the query planner?
(in this particular case where the clauses reduce the result set).
So it’s the query planner that’s running out of heap space? That also explains why the query :timeout wasn’t working.
No
There is only one triple pattern in your query, the planner is not engaged at all
The current rule engine is prone to running out of memory, it is a top down system, we need to rewrite it . It runs out of memory easily with a recursive rule, e.g. our math genealogy benchmark
I don’t know what your rules do, it’s them running out of memory
Basically, we need to rewrite the rule engine to be bottom up, so it can take advantage of the query planner. Also, turn recursive rules into loops, so it won’t use too much memory
This is the next major milestone.
This is also relatively new research. Most old work of datalog evaluation (e.g. magic set, etc) are not performing very well, limiting the applicability of datalog. Only very recent work (published this year) is getting to the core issues of recursive datalog rule processing, and become competitive (compared with rdbms)
So, really, science advances slowly
In retrospect, a lot of things seem obvious, but it takes a long long time to figure out
The current rule engine processes rules in the order as you write them, so rearranging them makes a big difference.
The rewritten engine will be order independent.
Only after rule engine rewrite is done, the potentials of datalog will be obtained. E.g. a lot of machine learning algorithms can be done with some rules, so one can do ML work in Datalevin
Pagerank, k means, regressions, even deep learning, most ML algorithms are just a matter of writing a few rules, really.
Software will eat the world, database will eat all software. 😀
That is how I see the future
As to your question about query planner, it currently only handles triple patterns and function calls. The rest are handled by the old datascript code.
Thanks for the detailed explanation, all of that makes a lot more sense. I’ll keep that all in mind in future. I’m a sql veteran and it’s just incredible how much datalog simplifies writing complex queries. The schema is also much simpler as join tables mostly disappear. Not having to manage indexes is also a massive boon. ⚡