This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2021-10-08
Channels
- # announcements (20)
- # aws (16)
- # babashka (63)
- # beginners (75)
- # calva (35)
- # cider (2)
- # clj-commons (5)
- # clj-kondo (2)
- # cljs-dev (1)
- # clojure (90)
- # clojure-australia (3)
- # clojure-europe (16)
- # clojure-france (1)
- # clojure-nl (4)
- # clojure-uk (5)
- # clojurescript (7)
- # data-science (2)
- # datahike (1)
- # datomic (39)
- # emacs (31)
- # events (2)
- # figwheel-main (1)
- # fulcro (15)
- # gratitude (8)
- # helix (17)
- # holy-lambda (1)
- # introduce-yourself (1)
- # jobs (3)
- # kaocha (2)
- # liquid (1)
- # malli (1)
- # nrepl (2)
- # other-languages (1)
- # portal (76)
- # react (19)
- # reagent (9)
- # remote-jobs (1)
- # rewrite-clj (9)
- # shadow-cljs (31)
- # tools-deps (5)
- # xtdb (11)
I have a query like this:
(d/query
{:query {:find '[?e]
:in '[$ ?v [?i ...]]
:where '[[?e :e/v ?v]
[?e :e/i ?i]]}
:args [(d/db conn)
true
(vec (take 500 (:i params)))]})
And it produces an OutOfMemoryError.
But if I change (take 500 ...
to (take 10 ...
, it works.
But that makes it a less restrictive query, right? How does that make sense?
Why does this happen?
I know (now) that swapping the order of the where clauses solves this problem.
Could it be possible that something is wrong in the Datomic query planner? Or is there something I'm not getting yet?So the first clause binds all ?e
for which :e/v
is true.
As far as I understand it, the where clauses are executed for each unique combination of in’s, and the results are union’ed.
Therefore, with [?i …]
being 500 items long, you execute that large binding 500 times, whereas with [?i …]
being 10 items, you only do it 10 items.
There is no query planner in Datomic btw, you need to make sure yourself that the most restrictive clause is first.
Hehe thank you Lennart 🙂
How is "the most restrictive clause" calculated? The total number of datoms matched by that clause?
Or should the collection matches be considered special because of union/or?
I guess I am probably not getting something... Here is a gist that demonstrates the OOM though: https://gist.github.com/ivarref/0d3d34eeeffbc4625d6120727368e405
I am at least surprised to get an OOM in this case
An example demonstrating optimizing for more selective where clause ordering https://github.com/cognitect-labs/day-of-datomic-cloud/blob/master/tutorial/query_perf.clj
Datomic isn’t necessarily determining whether a clause is most selective beforehand. It just finds all Datoms satisfying the clause. So as a programmer you need to choose what clause is most selective. Say you look for all ‘males named Lennart’, the more selective clause is ‘named Lennart’. You know, around half of the population is male, and way way less people are named Lennart ^^.
So the query would be most efficient if first finding all Lennart' and then filtering them on being male.
I get that ... But what makes a clause the most selective? It matches the fewest datoms?
From my gist example when the most selective (matching 1k datoms) is first, it OOMs.
Matches fewest datoms yeah
OK --- well, I still don't understand then why Datomic would OOM when the most selective clause is indeed first?
Or will this list binding create a cross product somehow?
Yes, I like to think that it iterates through all tuples of in values
OK, but why would a loop of 50k iterations OOM?
I just don't quite see the "reason" for the OOM
I probably need to play more with datalog ...
https://docs.datomic.com/on-prem/best-practices.html#collections-as-inputs I'd like a warning here if indeed you should treat collection inputs as potential cross product OOM-producers
This is outside of my knowledge. I don’t know why datomic would struggle binding a thousand entities 10000 times
OK, well thank you for all your input either way 🙂 Not sure what timezone you are in, but here in Norway it's getting late, so I'm off for the weekend. Have a nice weekend 😎
Especially because it happens to be the same set for every iteration
Yeah... I get the feeling this is bug, no?
As in, in a good way
@UGJE0MM0W Just to add to what others have already said, the clause ordering will obviously have an impact on the query performance and size of intermediate results collected by the query. But if the total number of items ultimately returned by your query is too large to fit into the available memory, no reordering of clauses will work.
I'm wondering if you could test your original query using qseq
instead of query
or q
? As qseq
is lazy, I think you might be able to make some progress this way.. at least, if the original problem was that the query result was just too large to fit into memory AND you can process the results bit by bit.
How is OOMing simplicity?
qseq
is also OOMing. I've updated the gist with a qseq
example.
@UCYS6T599 Please see the gist for a reproducible case: https://gist.github.com/ivarref/0d3d34eeeffbc4625d6120727368e405 The result set is 1000 entities long. A manual unrolling of the where worked (it's super slow, but at least it does not OOM):
(let [ids (vec (take 50000 uuids))
db (d/db conn)]
(->> (for [e (d/query
{:query {:find '[[?e ...]]
:in '[$ ?b]
:where '[[?e :e/b ?b]]}
:args [db true]})]
(set (d/query {:query {:find '[?e ?u]
:in '[$ ?e [?u ...]]
:where '[[?e :e/u ?u]]}
:args [db e ids]})))
(reduce set/union #{})))
Hi. Absolute Datomic beginner here.
I am running Postgres in a Docker container and am trying to connect the transactor.
For this I am using the sql-transactor.properties
sample file as a base.
How do I configure this to connect the transactor from the host to postgres inside Docker? The transactor throws an error that the hostname (the service name from the docker-compose file) is unknown.
How are you running the transactor, and what is your platform (Mac OS / Linux / … ?)
I'm on Linux.
I am running postgres via docker-compose, the service is called pg
, ports are per usual and I am running the Clojure application as well as the transactor on my host machine. So the transactor is not running inside Docker
The transactor config file is derived from the sql sample only adding the license-key and changing the following:
protocol=sql
host=pg
port=5432
sql-url=jdbc:
sql-user=datomic
sql-password=datomic
I also created the DB, the datomic_kvs
table and a datomic user per the instructions on their website using the SQL statements I've found in the Datomic folder.