Fork me on GitHub
#datomic
<
2023-01-01
>
cl_j15:01:57

For a large d/q on datomic cloud query which returns a large amount of data and takes almost one minute to finish, does it make sense to split the large query into many small queries and run these queries in parallel and combine the results? I think this could help reducing the query time but I am not sure whether this can reduce the memory requirement?

favila18:01:09

Sorry I thought I was replying in a thread, see channel (iā€™m on a phone rn)

cl_j01:01:43

Thanks @U09R86PA4, very useful information. Another technique I'd like to try is to use custom query/aggregate functions to filter and reduce the result set returned by Datomic, I think for large data set, this might perform better than pulling all results and doing the computation and filtering in Clojure, since this would required less data transferred from Datomic to Clojure

favila07:01:32

You can test whether this is worth it by doing some simple aggregate with zero cost and max result set size compression (eg count) and measuring the difference with and without it

favila07:01:01

Aggregates still realize the entire result set so you really only save IO and de/serialization time

šŸ‘ 2
šŸ™ 2
favila18:01:48

I have had very good results with this technique

favila18:01:12

Where the first where clause has a very large result set

favila18:01:28

Divide it up into chunks, run the query with just enough parallelism to pipeline (like n=2, or even n=1) and merge the results

favila18:01:03

Reducing intermediate result set size makes a huge difference, IME often runs faster than a single query on an instance with a much larger heap