This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2017-08-15
Channels
- # architecture (2)
- # beginners (16)
- # boot (2)
- # cider (4)
- # clara (6)
- # cljs-dev (78)
- # cljsrn (3)
- # clojure (158)
- # clojure-austin (1)
- # clojure-belgium (1)
- # clojure-dusseldorf (19)
- # clojure-italy (8)
- # clojure-russia (3)
- # clojure-spec (77)
- # clojure-uk (61)
- # clojurescript (341)
- # cursive (9)
- # data-science (12)
- # datomic (18)
- # emacs (9)
- # fulcro (109)
- # hoplon (10)
- # juxt (2)
- # leiningen (2)
- # lumo (31)
- # off-topic (1)
- # om (4)
- # onyx (40)
- # parinfer (17)
- # re-frame (36)
- # reagent (19)
- # spacemacs (10)
- # vim (60)
- # yada (20)
I’m curious how folks are dealing with slow queries, more specifically slow queries that eagerly load a ton of results for further processing. I’ve done as much as I can to speed up queries (liberally adding indexes, isolating huge swaths of data in separate partitions to help the peer load less segments), and now I’m starting to hit limits due to the size of data…
My next experiment is to try and simplify the queries to only give me a starting point, and then try various combinations of core.async and transducers to walk the graph and see if that speeds things up
Just curious if anyone else has gone down this path and has any advice to offer
I've had the same problem, my approach has been to offload the work to ElasticSearch. I think Datomic is just not well suited for low-latency analytical queries that span a lot of data; fortunately, thanks to the txReportQueue and the Log API, it's very well suited to be a source for derived data systems.
Also note that the current Datalog engine is not completely immune to the N+1 problem; I've observed that running a Datalog query which only needs one index access is still 100x slower than using the raw index API - as if there was some startup time associated with the Datalog engine
Of course I encourage you to profile and draw your own conclusions
thanks, these very useful insights! I’m already working off derived data (source data is also in datomic but in different partitions), but there are just some conditions that datalog seems to fall flat under
I’m also trying to keep the stack very flat and simple, it is not a huge app, just a lot of varied data (but not “big data” either)
you should use Datalog when you need to model joins. if you know you’ll only be using one index, you can almost certainly do it faster with d/datoms
, because Datalog will always produce two result sets one for the clause, and one for the find expressions. reductions over d/datoms produce only one.
Thanks Rob, I’ll have a look. I’m really sure I can do it with one index, if not, it won’t be a big jump to restructure the derivatives to make this possible
ZOMG Rob! You just opened my eyes to something amazing