This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2016-10-12
Channels
- # beginners (34)
- # boot (210)
- # cider (16)
- # cljs-dev (65)
- # cljsrn (3)
- # clojars (2)
- # clojure (107)
- # clojure-austin (8)
- # clojure-berlin (10)
- # clojure-brasil (1)
- # clojure-canada (1)
- # clojure-dev (1)
- # clojure-fr (1)
- # clojure-italy (22)
- # clojure-new-zealand (12)
- # clojure-nl (28)
- # clojure-russia (13)
- # clojure-spec (25)
- # clojure-uk (10)
- # clojurescript (109)
- # cursive (18)
- # datomic (44)
- # defnpodcast (1)
- # dirac (4)
- # emacs (2)
- # funcool (1)
- # hoplon (16)
- # jobs (14)
- # lambdaisland (23)
- # leiningen (2)
- # luminus (3)
- # off-topic (7)
- # om (58)
- # onyx (16)
- # proton (6)
- # re-frame (42)
- # reagent (55)
- # ring-swagger (5)
- # untangled (47)
- # vim (9)
I am wondering, does Datomic do efficient queries using <, >, >= and <= or does it end up doing full scans? We are seeing some bad query performance, compared to an SQL base, so wondering what can be expected.
The query in question is something like this, which uses coordinates to get addresses inside a bounding box:
'[:find [?a ...]
:in $ ?xmin ?ymin ?xmax ?ymax
:where
[?a :adresse/etrs89koordinat-oest ?x]
[(< ?xmin ?x)]
[(>= ?xmax ?x)]
[?a :adresse/etrs89koordinat-nord ?y]
[(< ?ymin ?y)]
[(>= ?ymax ?y)]]
@casperc Release 0.9.5130 and later include optimization of range predicates in query (see http://blog.datomic.com/2015/01/datalog-enhancements.html). Are you running on that version or newer?
Basically, I am comparing a query against a datomic base with 3M entities of the type we are looking for and an sql query against a base with the same data.
The result is about 500k entities and takes about 5s in Datomic and 1.2s in the SQL base. I am just wondering why the big difference.
You might want to review https://github.com/Datomic/day-of-datomic/blob/master/tutorial/decomposing_a_query.clj In general, you may be able to reorder the datalog clauses to improve the efficiency
Without knowing more about your dataset specifically it’s hard to recommend which clauses should be moved, but I would start with the two non-range clauses at the top
Looking again, since you supply min and max values to the query, you might actually see better performance by moving the non-range statements down. Again, it depends heavily on your dataset and the best approach may simply be a bit of testing.
@marshall: Its about the same with all permutations i can think of, including moving them up and down 😞
time the query (with whatever tooling you want) with only individual clauses included
that may or may not be particularly enlightening, but you can also add additional clauses and groups of clauses and determine which sets and joins are expensive
@casperc you might also test the results of the intersection of the two equivalent calls to index range: http://docs.datomic.com/clojure/#datomic.api/index-range
@bkamphaus: Ah, will do.
to some extent there will just be limits on the performance of this shape of query against Datomic’s indexes vs. an R-tree or something.
What would be the best way to make the intersection from an index-range call? Is it possible (and performant) to do inside the query?
Is something like this possible:
'[:find ?a
:in $ ?xmin ?ymin ?xmax ?ymax
:where
[(datomic.api/index-range $ :adresse/etrs89koordinat-nord ?ymin ?ymax) [?a]]
[(datomic.api/index-range $ :adresse/etrs89koordinat-oest ?xmin ?xmax) [?a]]
]
Didn’t end up testing that myself, you may just be encountering the laziness (re: timing test), so make sure to realize, e.g. with into
. I probably wouldn’t make the API call in the query, it’s all going to be realized in memory anyways, just (into #{) (map :e (seq results)))
the results or something and clojure.set/intersection
the two calls.
(count (time (clojure.set/intersection (into #{} (map :e (d/index-range (d/db (get-conn :kildedata)) :adresse/etrs89koordinat-oest 718333.6321944933 731542.4349335412)))
(into #{} (map :e (d/index-range (d/db (get-conn :kildedata)) :adresse/etrs89koordinat-nord 6170381.489927892 6181147.04591752))))))
"Elapsed time: 1656.835585 msecs"I tried this
'[:find ?a
:in $ ?xmin ?ymin ?xmax ?ymax
:where
[(datomic.api/index-range $ :adresse/etrs89koordinat-nord ?ymin ?ymax) [[?a]]]
[(datomic.api/index-range $ :adresse/etrs89koordinat-oest ?xmin ?xmax) [[?a]]]
]
But it never finished.Can’t comment on the never finishing query, apart from saying that I’d in general keep index-range
, datoms
etc. calls out of query (basically you’re doing something from the primitives instead of querying). My guess is the time difference w/index range comes from the necessary structure of the query which starts with what’s probably an aevt
lookup to limit to only entities with x
and y
values of interest, whereas the index-range
call makes one pass per constraint just with the avet
query.
Alright, cool. Well again it is an improvement for sure, so I hope it will be enough for our use case.
Thanks alot to both of you @bkamphaus and @marshall 👍:skin-tone-2:
There is options in the transactor config to push metrics to CloudWatch, is there any way of doing this for peers ?
you can wire up w/e reporting to peer w/the metrics callback: http://docs.datomic.com/monitoring.html#sec-2-2
I see, but I then have to implement the AWS - PutMetricDataRequest MetricDatum units etc. myself, i was wondering as this is implemented in the transactor there might be a shortcut. 😉
Is there a way to tell if clojure.lang.ExceptionInfo: Error communicating with HOST
is due to not being able to communicate with the host or going over your process limit?