datalevin 2022-08-09 | Slack Archive

roklenarcic16:08:18

it is possible to make a query where I fetch entities which have current timestamp between from-date to-date? This is a pretty common query in most commercial software. I tried like this:

(d/q '[:find ?e :in $ ?current-ts :where
       [?e :from ?from]
       [?e :to ?to]
       [(<= ?from ?current-ts)]
       [(<= ?current-ts ?to)]]
     (d/db conn) #inst "2022-10-10")
Execution error (ExceptionInfo) at datalevin.client/normal-request (client.clj:338).
Request to Datalevin server failed: "java.util.Date cannot be cast to java.lang.Number"

roklenarcic16:08:31

on a related note: do predicates like <= use some sort of an index operation or do they run the actual clojure.core/<=? In Datomic the compare predicates are special and use indexes

roklenarcic19:08:01

Judging by this callstack it runs clojure.core function on all results:

[clojure.lang.Numbers lt "Numbers.java" 253]
               [clojure.core$_LT_ invokeStatic "core.clj" 909]
               [clojure.core$_LT_ invoke "core.clj" 902]
               [clojure.lang.AFn applyToHelper "AFn.java" 156]
               [clojure.lang.RestFn applyTo "RestFn.java" 132]
               [clojure.core$apply invokeStatic "core.clj" 667]
               [clojure.core$apply invoke "core.clj" 662]
               [datalevin.query$make_call invokeStatic "query.cljc" 618]
               [datalevin.query$make_call invoke "query.cljc" 615]
               [datalevin.query$_call_fn$fn__13114 invoke "query.cljc" 654]

Huahai16:08:39

Dates should be compared with date specific operations, i.e. something like (.after ?from ?current)

Huahai16:08:58

Java interops works, use it instead for working with java.util.Date

roklenarcic16:08:38

Wouldnt that be slow, as it doesnt use an index of some kind?

Huahai16:08:13

There’s only index in triple store, there’s nothing else

Huahai16:08:25

everything is index

roklenarcic16:08:12

By that I mean .after operation would have to be applied to each datom in turn

roklenarcic16:08:20

If i am not mistsken

Huahai16:08:34

what else do you expect?

Huahai17:08:24

Everything is indexed in triple store. So it always use index. I am not sure what you mean by “it doesn’t use an index of some kind”?

Huahai17:08:11

In term of querying, Datomic, like Datascript, doesn’t have query optimization of any kind, whereas Datalevin does have some simple optimizations, e.g. we reorder where clauses according to true cardinality, and we optimize the hash joins a bit. These are far from enough. The query engine rewrite I am working on will introduce new indices and do many more optimizations.

Huahai17:08:56

Stay tuned. Hopefully, I can reach my goal of bringing query performance close to relational DBs. I hope to beat Postgres for complex queries (there’s no hope to beat it in simple queries, triple stores will always be slower in these cases).

roklenarcic19:08:59

I am referring to this chapter of datomic docs: https://docs.datomic.com/cloud/query/query-data-reference.html#range-predicates Basically what it says is that ordering predicates don’t actually invoke clojure.core functions on data, but use index directly. These probably apply to inst type as well, as that is a native type in datomic, so no need for .after or such. What using an index directly means is that usually the indexes also represent a natural sort, so a query (< 3 x) is trivial to answer in a typical SQL style index, because entries are sorted. Datomic uses that fact to optimize these predicates. Finally I would like to say that an entity having two timestamps valid-from valid-to and then having to do queries for entities that are valid at a certain time, is such a common need in software you really need to have a good user story around it. XTDB builds their whole value proposition on supporting these kind of queries as built-in on every entity.

Huahai19:08:11

There’s no difference between Datomic and Datalevin in how range predicates are handled. In this case, the system automatically uses the :ave index because the clause only specifies attribute. There’s no magic, it’s pretty straightforward.

Huahai20:08:37

As I already mentioned several times, there’s only index in triple store. There’s nothing else. What is an index? It’s a B+ tree. A look up in the index is of course doing binary search. But each comparison in that binary search has to run the range predicate. That’s what I mean by “What else do you expect?“. You will have to run the predicate on the datom. Fortunately, you only run it a few times to get to where you want, it’s a B+ tree after all. Got it?

Huahai20:08:36

There are two options to do that comparison, you can either compare them as raw bytes, or compare them as Date objects, the speed will be similar, because the data are stored as bytes and you need to do a conversion in either method. Datalevin offers you both options. For the former, you can write your own predicate that takes raw byte buffer as argument to work with. For the later, you use .after method on the Date object. The logic will be same in either case, so the speed will be the same. Maybe the later is faster, because it is likely cached whereas your customized function is unlikely to be. Is it worth it to overload the comparison operators to handle the special cases of Dates or others? I doubt it. That extra logic probably slow it down more than it worth. I probably will not bother with it.

Huahai20:08:30

I prefer that time logic is handled by user themselves. Any built-in time logic in a database is going to suck and complicate things unnecessarily. This is just my opinion.

Huahai20:08:04

Finally, Datalevin is not an immutable DB, it’s a regular DB like most other DBs, so the fields of “validate from” and “validate to” don’t make sense. Everything in the DB is valid at the moment, so one can optionally turn on :auto-entity-time , the system will then maintain :db/created-at and :db/updated-at fields for your entities. For anything extra, you will need to do yourself, as the logic can be arbitrary and a database cannot possibly anticipate all use cases.

Huahai21:08:33

In the end of the day, you will need to measure database performance yourself on your own data. Go ahead and measure. Don’t trust documentation of a database vendor. Database vendors are notorious, for they invented this abomination called DeWitt Clause. Unfortunately, Datomic does have a DeWitt clause, so you can only measure privately and keep results to yourself. All I can say is, measure away!

roklenarcic19:08:46

I guess that confusion comes because I thought that you couldn’t navigate a B+ tree index with a generic true/false predicate. I assumed that you used the result of a predicate like (< ?ts ?to) or (even? ?num) or (< 3 ?num 10) as a filter operation on the final set of Datoms, which would be linear complexity with the number of Datoms before predicates were applied. I must confess that I still don’t know how to use a general predicate to navigate B+ tree. I’ll just have to trust that you do. Thanks for the explanation.

roklenarcic20:08:52

Tried running it with:

(def a (atom 0))

(defn <=date [d1 d2]
  (swap! a inc)
  (.after d2 d1))

(d/q '[:find ?e :in $ ?current-ts :where
       [?e :from ?from]
       [?e :to ?to]
       [(<=date ?from ?current-ts)]
       [(<=date ?current-ts ?to)]]
     (d/db conn) #inst "2022-10-10")

And 100000 :from and :to datoms. <=date was called 200000 times.

Huahai20:08:51

In the current query engine, <=date will only run after index scan. It will be difficult to do otherwise.

Huahai21:08:49

You can try the same in datomic

Huahai21:08:04

The next version of Datalevin will try push down such predicate. This particular case should not be too hard.

Huahai21:08:52

Right now, none of the datalog stores do such optimizations.

Huahai21:08:22

Lots of low hanging fruits indeed.

Huahai21:08:43

As to your confusion, you can navigate B+ tree with any comparison function. We expose this in the KV API already. The Datalog query language can be enhanced to expose this too, after I finish the query engine rewrite.

2022-08-09

Channels