Fork me on GitHub
#datomic
<
2021-10-08
>
Ivar Refsdal13:10:27

I have a query like this:

(d/query
    {:query {:find '[?e]
             :in '[$ ?v [?i ...]]
             :where '[[?e :e/v ?v]
                      [?e :e/i ?i]]}
     :args [(d/db conn)
            true
            (vec (take 500 (:i params)))]})
And it produces an OutOfMemoryError. But if I change (take 500 ... to (take 10 ..., it works. But that makes it a less restrictive query, right? How does that make sense? Why does this happen? I know (now) that swapping the order of the where clauses solves this problem. Could it be possible that something is wrong in the Datomic query planner? Or is there something I'm not getting yet?

Lennart Buit13:10:55

So the first clause binds all ?e for which :e/v is true. As far as I understand it, the where clauses are executed for each unique combination of in’s, and the results are union’ed. Therefore, with [?i …] being 500 items long, you execute that large binding 500 times, whereas with [?i …] being 10 items, you only do it 10 items.

Lennart Buit13:10:05

There is no query planner in Datomic btw, you need to make sure yourself that the most restrictive clause is first.

1
Lennart Buit13:10:38

In this case you have (empirically) found out that that is [?e :e/i ?i] 😛

😁 1
Ivar Refsdal15:10:12

Hehe thank you Lennart 🙂

Ivar Refsdal16:10:11

How is "the most restrictive clause" calculated? The total number of datoms matched by that clause?

Ivar Refsdal16:10:28

Or should the collection matches be considered special because of union/or?

Ivar Refsdal16:10:05

I guess I am probably not getting something... Here is a gist that demonstrates the OOM though: https://gist.github.com/ivarref/0d3d34eeeffbc4625d6120727368e405

Ivar Refsdal16:10:30

I am at least surprised to get an OOM in this case

Lennart Buit18:10:19

Datomic isn’t necessarily determining whether a clause is most selective beforehand. It just finds all Datoms satisfying the clause. So as a programmer you need to choose what clause is most selective. Say you look for all ‘males named Lennart’, the more selective clause is ‘named Lennart’. You know, around half of the population is male, and way way less people are named Lennart ^^.

Lennart Buit18:10:18

So the query would be most efficient if first finding all Lennart' and then filtering them on being male.

Ivar Refsdal19:10:32

I get that ... But what makes a clause the most selective? It matches the fewest datoms?

Ivar Refsdal19:10:31

From my gist example when the most selective (matching 1k datoms) is first, it OOMs.

Lennart Buit19:10:21

Matches fewest datoms yeah

Ivar Refsdal19:10:06

OK --- well, I still don't understand then why Datomic would OOM when the most selective clause is indeed first?

Ivar Refsdal19:10:32

Or will this list binding create a cross product somehow?

Lennart Buit19:10:25

Yes, I like to think that it iterates through all tuples of in values

Ivar Refsdal19:10:09

OK, but why would a loop of 50k iterations OOM?

Ivar Refsdal19:10:44

I just don't quite see the "reason" for the OOM

Ivar Refsdal19:10:04

I probably need to play more with datalog ...

Ivar Refsdal19:10:48

https://docs.datomic.com/on-prem/best-practices.html#collections-as-inputs I'd like a warning here if indeed you should treat collection inputs as potential cross product OOM-producers

Lennart Buit19:10:48

This is outside of my knowledge. I don’t know why datomic would struggle binding a thousand entities 10000 times

Ivar Refsdal19:10:21

OK, well thank you for all your input either way 🙂 Not sure what timezone you are in, but here in Norway it's getting late, so I'm off for the weekend. Have a nice weekend 😎

Lennart Buit19:10:23

Especially because it happens to be the same set for every iteration

Ivar Refsdal19:10:40

Yeah... I get the feeling this is bug, no?

Lennart Buit19:10:17

Nah it is ofter datomic outsmarting us in its simplicity

clj 1
Lennart Buit19:10:21

As in, in a good way

pyry06:10:16

@UGJE0MM0W Just to add to what others have already said, the clause ordering will obviously have an impact on the query performance and size of intermediate results collected by the query. But if the total number of items ultimately returned by your query is too large to fit into the available memory, no reordering of clauses will work.

pyry06:10:33

I'm wondering if you could test your original query using qseq instead of query or q? As qseq is lazy, I think you might be able to make some progress this way.. at least, if the original problem was that the query result was just too large to fit into memory AND you can process the results bit by bit.

Ivar Refsdal10:10:42

How is OOMing simplicity? qseq is also OOMing. I've updated the gist with a qseq example.

Ivar Refsdal10:10:34

@UCYS6T599 Please see the gist for a reproducible case: https://gist.github.com/ivarref/0d3d34eeeffbc4625d6120727368e405 The result set is 1000 entities long. A manual unrolling of the where worked (it's super slow, but at least it does not OOM):

(let [ids (vec (take 50000 uuids))
          db (d/db conn)]
      (->> (for [e (d/query
                     {:query {:find  '[[?e ...]]
                              :in    '[$ ?b]
                              :where '[[?e :e/b ?b]]}
                      :args  [db true]})]
             (set (d/query {:query {:find  '[?e ?u]
                                    :in    '[$ ?e [?u ...]]
                                    :where '[[?e :e/u ?u]]}
                            :args  [db e ids]})))
           (reduce set/union #{})))

sebastian22:10:07

Hi. Absolute Datomic beginner here. I am running Postgres in a Docker container and am trying to connect the transactor. For this I am using the sql-transactor.properties sample file as a base. How do I configure this to connect the transactor from the host to postgres inside Docker? The transactor throws an error that the hostname (the service name from the docker-compose file) is unknown.

thumbnail08:10:31

How are you running the transactor, and what is your platform (Mac OS / Linux / … ?)

sebastian09:10:19

I'm on Linux. I am running postgres via docker-compose, the service is called pg, ports are per usual and I am running the Clojure application as well as the transactor on my host machine. So the transactor is not running inside Docker

sebastian09:10:20

The transactor config file is derived from the sql sample only adding the license-key and changing the following:

protocol=sql
host=pg
port=5432

sql-url=jdbc:
sql-user=datomic
sql-password=datomic

sebastian09:10:27

I also created the DB, the datomic_kvs table and a datomic user per the instructions on their website using the SQL statements I've found in the Datomic folder.

sebastian09:10:26

Postgres is running, I can connect to it via an Adminer I am also running with the same docker-compose

sebastian09:10:20

upon starting the transactor, it says system started but then I am getting a stacktrace with the "interesting part" being Caused by: java.net.UnknownHostException: pg