Fork me on GitHub
#datomic
<
2016-10-12
>
casperc11:10:40

I am wondering, does Datomic do efficient queries using <, >, >= and <= or does it end up doing full scans? We are seeing some bad query performance, compared to an SQL base, so wondering what can be expected.

casperc11:10:28

The query in question is something like this, which uses coordinates to get addresses inside a bounding box:

'[:find [?a ...]
:in $ ?xmin ?ymin ?xmax ?ymax
 :where
[?a :adresse/etrs89koordinat-oest ?x]
[(< ?xmin ?x)]
[(>= ?xmax ?x)]
[?a :adresse/etrs89koordinat-nord ?y]
[(< ?ymin ?y)]
[(>= ?ymax ?y)]]

marshall12:10:07

@casperc Release 0.9.5130 and later include optimization of range predicates in query (see http://blog.datomic.com/2015/01/datalog-enhancements.html). Are you running on that version or newer?

casperc13:10:09

@marshall: We are running 0.9.5394, so we should.

casperc13:10:24

Basically, I am comparing a query against a datomic base with 3M entities of the type we are looking for and an sql query against a base with the same data.

casperc13:10:21

The result is about 500k entities and takes about 5s in Datomic and 1.2s in the SQL base. I am just wondering why the big difference.

casperc13:10:51

And a count in the SQL base is 60ms and still 5s in datomic

casperc13:10:25

The :adresse/etrs89koordinat-oest field is a float if that matters

marshall13:10:47

You might want to review https://github.com/Datomic/day-of-datomic/blob/master/tutorial/decomposing_a_query.clj In general, you may be able to reorder the datalog clauses to improve the efficiency

marshall13:10:29

Without knowing more about your dataset specifically it’s hard to recommend which clauses should be moved, but I would start with the two non-range clauses at the top

casperc13:10:42

Well it is basically coordinates and the query is for a bounding box.

casperc13:10:00

So I don’t think there are any clause that reduces the result more than other

marshall13:10:47

Looking again, since you supply min and max values to the query, you might actually see better performance by moving the non-range statements down. Again, it depends heavily on your dataset and the best approach may simply be a bit of testing.

casperc13:10:36

@marshall: Its about the same with all permutations i can think of, including moving them up and down 😞

casperc13:10:41

Is there any way to debug what the time is being spent on?

marshall13:10:05

That example I provided from Day of Datomic shows a way to do that

casperc13:10:11

To see cache hits/misses and that sort?

marshall13:10:23

you omit clauses from the query and look at how many results are returned

marshall13:10:41

you could also time individual sections/clauses and groups of clauses

marshall13:10:04

the peer metrics report cache and memcached hit/miss data

casperc13:10:51

How would I go about timing individual clauses?

marshall13:10:28

time the query (with whatever tooling you want) with only individual clauses included

marshall13:10:01

that may or may not be particularly enlightening, but you can also add additional clauses and groups of clauses and determine which sets and joins are expensive

Ben Kamphaus13:10:15

@casperc you might also test the results of the intersection of the two equivalent calls to index range: http://docs.datomic.com/clojure/#datomic.api/index-range

casperc13:10:21

ok, well first off removing the y-coordinate reduces the time it takes somewhat

casperc13:10:33

Does that mean it is doing a full scan for the remaining, and not the index?

Ben Kamphaus13:10:40

to some extent there will just be limits on the performance of this shape of query against Datomic’s indexes vs. an R-tree or something.

casperc13:10:15

What would be the best way to make the intersection from an index-range call? Is it possible (and performant) to do inside the query?

casperc13:10:41

Each are very fast (sub-milisecond)

casperc13:10:25

Is something like this possible:

'[:find ?a
                            :in $ ?xmin ?ymin ?xmax ?ymax
                            :where
                            [(datomic.api/index-range $ :adresse/etrs89koordinat-nord ?ymin ?ymax) [?a]]
                            [(datomic.api/index-range $ :adresse/etrs89koordinat-oest ?xmin ?xmax) [?a]]
                            ]

Ben Kamphaus13:10:54

Didn’t end up testing that myself, you may just be encountering the laziness (re: timing test), so make sure to realize, e.g. with into. I probably wouldn’t make the API call in the query, it’s all going to be realized in memory anyways, just (into #{) (map :e (seq results))) the results or something and clojure.set/intersection the two calls.

casperc14:10:13

(count (time (clojure.set/intersection (into #{} (map :e (d/index-range (d/db (get-conn :kildedata)) :adresse/etrs89koordinat-oest 718333.6321944933 731542.4349335412)))
                                       (into #{} (map :e (d/index-range (d/db (get-conn :kildedata)) :adresse/etrs89koordinat-nord 6170381.489927892 6181147.04591752))))))
"Elapsed time: 1656.835585 msecs"

casperc14:10:37

Which is an improvement for sure

casperc14:10:08

I tried this

'[:find ?a
                            :in $ ?xmin ?ymin ?xmax ?ymax
                            :where
                            [(datomic.api/index-range $ :adresse/etrs89koordinat-nord ?ymin ?ymax) [[?a]]]
                            [(datomic.api/index-range $ :adresse/etrs89koordinat-oest ?xmin ?xmax) [[?a]]]
                            ]
But it never finished.

Ben Kamphaus14:10:08

Can’t comment on the never finishing query, apart from saying that I’d in general keep index-range, datoms etc. calls out of query (basically you’re doing something from the primitives instead of querying). My guess is the time difference w/index range comes from the necessary structure of the query which starts with what’s probably an aevt lookup to limit to only entities with x and y values of interest, whereas the index-range call makes one pass per constraint just with the avet query.

casperc14:10:33

Alright, cool. Well again it is an improvement for sure, so I hope it will be enough for our use case.

casperc14:10:06

Thanks alot to both of you @bkamphaus and @marshall 👍:skin-tone-2:

drankard14:10:12

There is options in the transactor config to push metrics to CloudWatch, is there any way of doing this for peers ?

Ben Kamphaus14:10:25

you can wire up w/e reporting to peer w/the metrics callback: http://docs.datomic.com/monitoring.html#sec-2-2

drankard14:10:41

I see, but I then have to implement the AWS - PutMetricDataRequest MetricDatum units etc. myself, i was wondering as this is implemented in the transactor there might be a shortcut. 😉

kenny22:10:01

Is there a way to tell if clojure.lang.ExceptionInfo: Error communicating with HOST is due to not being able to communicate with the host or going over your process limit?

kenny22:10:32

Also, what is the easiest way to update a license file on a running Datomic transactor deployed with CF?

afhammad23:10:25

is it possible to use pull against entire db instead of a specific entity?