Fork me on GitHub

I am wondering, does Datomic do efficient queries using <, >, >= and <= or does it end up doing full scans? We are seeing some bad query performance, compared to an SQL base, so wondering what can be expected.


The query in question is something like this, which uses coordinates to get addresses inside a bounding box:

'[:find [?a ...]
:in $ ?xmin ?ymin ?xmax ?ymax
[?a :adresse/etrs89koordinat-oest ?x]
[(< ?xmin ?x)]
[(>= ?xmax ?x)]
[?a :adresse/etrs89koordinat-nord ?y]
[(< ?ymin ?y)]
[(>= ?ymax ?y)]]


@casperc Release 0.9.5130 and later include optimization of range predicates in query (see Are you running on that version or newer?


@marshall: We are running 0.9.5394, so we should.


Basically, I am comparing a query against a datomic base with 3M entities of the type we are looking for and an sql query against a base with the same data.


The result is about 500k entities and takes about 5s in Datomic and 1.2s in the SQL base. I am just wondering why the big difference.


And a count in the SQL base is 60ms and still 5s in datomic


The :adresse/etrs89koordinat-oest field is a float if that matters


You might want to review In general, you may be able to reorder the datalog clauses to improve the efficiency


Without knowing more about your dataset specifically it’s hard to recommend which clauses should be moved, but I would start with the two non-range clauses at the top


Well it is basically coordinates and the query is for a bounding box.


So I don’t think there are any clause that reduces the result more than other


Looking again, since you supply min and max values to the query, you might actually see better performance by moving the non-range statements down. Again, it depends heavily on your dataset and the best approach may simply be a bit of testing.


@marshall: Its about the same with all permutations i can think of, including moving them up and down 😞


Is there any way to debug what the time is being spent on?


That example I provided from Day of Datomic shows a way to do that


To see cache hits/misses and that sort?


you omit clauses from the query and look at how many results are returned


you could also time individual sections/clauses and groups of clauses


the peer metrics report cache and memcached hit/miss data


How would I go about timing individual clauses?


time the query (with whatever tooling you want) with only individual clauses included


that may or may not be particularly enlightening, but you can also add additional clauses and groups of clauses and determine which sets and joins are expensive

Ben Kamphaus13:10:15

@casperc you might also test the results of the intersection of the two equivalent calls to index range:


ok, well first off removing the y-coordinate reduces the time it takes somewhat


Does that mean it is doing a full scan for the remaining, and not the index?

Ben Kamphaus13:10:40

to some extent there will just be limits on the performance of this shape of query against Datomic’s indexes vs. an R-tree or something.


What would be the best way to make the intersection from an index-range call? Is it possible (and performant) to do inside the query?


Each are very fast (sub-milisecond)


Is something like this possible:

'[:find ?a
                            :in $ ?xmin ?ymin ?xmax ?ymax
                            [(datomic.api/index-range $ :adresse/etrs89koordinat-nord ?ymin ?ymax) [?a]]
                            [(datomic.api/index-range $ :adresse/etrs89koordinat-oest ?xmin ?xmax) [?a]]

Ben Kamphaus13:10:54

Didn’t end up testing that myself, you may just be encountering the laziness (re: timing test), so make sure to realize, e.g. with into. I probably wouldn’t make the API call in the query, it’s all going to be realized in memory anyways, just (into #{) (map :e (seq results))) the results or something and clojure.set/intersection the two calls.


(count (time (clojure.set/intersection (into #{} (map :e (d/index-range (d/db (get-conn :kildedata)) :adresse/etrs89koordinat-oest 718333.6321944933 731542.4349335412)))
                                       (into #{} (map :e (d/index-range (d/db (get-conn :kildedata)) :adresse/etrs89koordinat-nord 6170381.489927892 6181147.04591752))))))
"Elapsed time: 1656.835585 msecs"


Which is an improvement for sure


I tried this

'[:find ?a
                            :in $ ?xmin ?ymin ?xmax ?ymax
                            [(datomic.api/index-range $ :adresse/etrs89koordinat-nord ?ymin ?ymax) [[?a]]]
                            [(datomic.api/index-range $ :adresse/etrs89koordinat-oest ?xmin ?xmax) [[?a]]]
But it never finished.

Ben Kamphaus14:10:08

Can’t comment on the never finishing query, apart from saying that I’d in general keep index-range, datoms etc. calls out of query (basically you’re doing something from the primitives instead of querying). My guess is the time difference w/index range comes from the necessary structure of the query which starts with what’s probably an aevt lookup to limit to only entities with x and y values of interest, whereas the index-range call makes one pass per constraint just with the avet query.


Alright, cool. Well again it is an improvement for sure, so I hope it will be enough for our use case.


Thanks alot to both of you @bkamphaus and @marshall 👍:skin-tone-2:


There is options in the transactor config to push metrics to CloudWatch, is there any way of doing this for peers ?

Ben Kamphaus14:10:25

you can wire up w/e reporting to peer w/the metrics callback:


I see, but I then have to implement the AWS - PutMetricDataRequest MetricDatum units etc. myself, i was wondering as this is implemented in the transactor there might be a shortcut. 😉


Is there a way to tell if clojure.lang.ExceptionInfo: Error communicating with HOST is due to not being able to communicate with the host or going over your process limit?


Also, what is the easiest way to update a license file on a running Datomic transactor deployed with CF?


is it possible to use pull against entire db instead of a specific entity?