Fork me on GitHub
#datomic
<
2023-09-06
>
notid15:09:30

I have a composite tuple that is indexed (2 attrs), but index-range that seems fairly slow to me - pulling around 700k datoms takes around 700ms, even when all are in the object cache. I did a profile, and found almost all of the time is being spent in datomic.common$compare_ex. Is there any way to get insight into what I might be able to do to speed this query up?

favila16:09:00

How do you know all are in the object cache? index-range doesn’t provide io-stats. How does the index-range compare to just d/datoms over the same attr? e.g. (time (count (d/datoms db :avet attr)))

favila16:09:25

and when you say “all of the time”--is this cpu time or wallclock time?

favila16:09:18

do you use index-range with an “end” parameter? Does it make a difference if you don’t?

notid16:09:43

Great questions. • Object Cache: my actual query is something like this [:find ?foo :in $ ?bar :where [(function-that-uses-index-range $ ?bar) ?foo] , and, pleasantly surprisingly - this seems to accurately report io stats. The underlying function only uses index-range. I also am running the profiling after the first execution of the query. • I should've said that i'm sampling, not profiling, so it can be imprecise, but is reported as cpu time • For my use case, not using end would be fetch way too much data. The composite tuple's schema is [:db.type/ref :db.type/instant], and so my range would typically look something like this: start: [some-entid #inst "2020-01-01"] end:`[some-ent-id #inst "2023-01-01"]`

favila16:09:56

I’m asking about “end” and vs d/datoms because maybe comparing values until the end of the range is what is slow. Both of these would remove that.

favila17:09:13

it could also just be allocations

favila17:09:28

query is going to realize every ?foo

favila17:09:49

and going to do set arithmetic on those values

favila17:09:21

d/index-range is lazy--is it also 700ms when you just run that function by itself with e.g. count to realize it without allocating all of it?

favila17:09:40

This is also where a comparison with d/datoms is instructive

notid17:09:41

Ah, interesting, (time (count (d/datoms db :avet attr))) takes 400ms

favila17:09:11

another comparison you can make, if function-that-uses-index-range is simple to port to a query, is to see how the query does when expressed directly with [?e :attr ?v][(<= ?start ?v)][(< ?v ?end)]

notid17:09:06

Yeah, my introduction of function-that-uses-index-range was introduced because the query performance was much worse. Raw usage of index-range without realizing every ?foo is much faster (200ms). I think that's the clue I needed

favila17:09:50

If your output set is significantly smaller than the input, consider reducing over chunks of your index-range input and running the query for each chunk

favila17:09:25

I don’t really understand why this is better; I theorize that it’s all about memory pressure from large sets

notid17:09:22

Got it, okay this is helpful, thank you.

favila17:09:52

e.g. (into #{} (comp (map :e) (partition-all 10000) (mapcat #(d/q query db %)) (d/index-range db attr start end))

👍 2
favila17:09:46

map/reduce style