2024-09-06 datalevin | Clojure Slack Archive

datalevin 2024-09-06

2024-09-06T15:01:21.640399Z

I’m trying to get some more insights into the query planner. I’ve read the docs on it (which are very informative). So this query runs out of heap space.

(d/q '[:find (count ?acc).
       :where
       (was-part-of-the-registered-drop ?acc)
       [?tr :transfer/source ?acc]
       (not-a-jup-fee ?tr)
       (not-an-lp-interaction ?tr)])

However, if I pull some of the query into a rule

(def has-sold
  '[(has-sold ?acc)
    [?tr :transfer/source ?acc]
    (not-a-jup-fee ?tr)
    (not-an-lp-interaction ?tr)])

And then call it.

(d/q '[:find (count ?acc).
       :where
       (was-part-of-the-registered-drop ?acc)
       (has-sold ?acc)])

The query runs in 758.25926 msecs. ⚡ Are rules handled differently? Or is it just because they limit the scope of unification? Also is there a way to limit the scope of unification without a rule? eg:

(d/q '[:find (count ?acc).
       :where
       (was-part-of-the-registered-drop ?acc)
       (let [?acc] ;; <- something like this?
         [?tr :transfer/source ?acc]
         (not-a-jup-fee ?tr)
         (not-an-lp-interaction ?tr)]))

Really impressed with how fast datalevin and I’m not using a small dataset (1183349 entities). The range queries are also blazing fast!

Huahai 2024-09-06T15:04:55.399479Z

Rules are not handled by query planner

Huahai 2024-09-06T15:05:47.666829Z

Rule engine rewrite is the next major work

2024-09-06T15:06:10.250859Z

Ahh so is my query fast because I’m bypassing the query planner?

2024-09-06T15:06:19.784119Z

(in this particular case where the clauses reduce the result set).

2024-09-06T15:12:46.971719Z

So it’s the query planner that’s running out of heap space? That also explains why the query :timeout wasn’t working.

Huahai 2024-09-06T16:10:36.946899Z

Huahai 2024-09-06T16:13:21.888839Z

There is only one triple pattern in your query, the planner is not engaged at all

👍 1

Huahai 2024-09-06T16:16:33.638709Z

The current rule engine is prone to running out of memory, it is a top down system, we need to rewrite it . It runs out of memory easily with a recursive rule, e.g. our math genealogy benchmark

Huahai 2024-09-06T16:19:39.190399Z

I don’t know what your rules do, it’s them running out of memory

Huahai 2024-09-06T16:25:55.238769Z

Basically, we need to rewrite the rule engine to be bottom up, so it can take advantage of the query planner. Also, turn recursive rules into loops, so it won’t use too much memory

Huahai 2024-09-06T16:28:44.402879Z

This is the next major milestone.

Huahai 2024-09-06T16:40:51.124799Z

This is also relatively new research. Most old work of datalog evaluation (e.g. magic set, etc) are not performing very well, limiting the applicability of datalog. Only very recent work (published this year) is getting to the core issues of recursive datalog rule processing, and become competitive (compared with rdbms)

Huahai 2024-09-06T16:41:27.794209Z

So, really, science advances slowly

💯 1

Huahai 2024-09-06T16:43:51.249659Z

In retrospect, a lot of things seem obvious, but it takes a long long time to figure out

Huahai 2024-09-06T16:48:44.831899Z

The current rule engine processes rules in the order as you write them, so rearranging them makes a big difference.

Huahai 2024-09-06T16:49:34.625949Z

The rewritten engine will be order independent.

Huahai 2024-09-06T16:53:07.811439Z

Only after rule engine rewrite is done, the potentials of datalog will be obtained. E.g. a lot of machine learning algorithms can be done with some rules, so one can do ML work in Datalevin

Huahai 2024-09-06T16:55:44.456249Z

Pagerank, k means, regressions, even deep learning, most ML algorithms are just a matter of writing a few rules, really.

Huahai 2024-09-06T16:58:21.059349Z

Software will eat the world, database will eat all software. 😀

💯 1

😮 1

Huahai 2024-09-06T16:58:40.200909Z

That is how I see the future

Huahai 2024-09-06T17:09:52.062649Z

As to your question about query planner, it currently only handles triple patterns and function calls. The rest are handled by the old datascript code.

2024-09-06T17:41:24.753509Z

Thanks for the detailed explanation, all of that makes a lot more sense. I’ll keep that all in mind in future. I’m a sql veteran and it’s just incredible how much datalog simplifies writing complex queries. The schema is also much simpler as join tables mostly disappear. Not having to manage indexes is also a massive boon. ⚡

👍 2

👍🏻 1

Clojurians Log v2

datalevin 2024-09-06