Fork me on GitHub

Hello! We're running Datomic Pro 1.0.6165 on-prem and are a bit puzzled as to why certain queries we think ought to be fast, well, aren't.


As part of a sysadmin's view to our system, we provide a page listing counts of entities per domain type stored in our datomic instance. There are roughly 10 million such entities at the moment, spread quite unevenly across 80+ domain types.


Rendering this page is at the moment slow, as getting the counts from datomic typically takes tens of seconds.


What we're doing at the moment to get the counts is roughly the following:

(->> (d/datoms db :aevt ::object/type)
       (map :v)


As mentioned, this doesn't perform too well on our data. A natural alternative would of course be to query the counts instead, which I think was what we did earlier. If memory serves me correctly, this however didn't perform too well either..


Nonetheless, I think it makes sense to again try writing the query as something like so:

(d/q '[:find ?t (count ?e)
       :in $
       :where [?e ::object/type ?t]]


Question: Should datomic be able to get the (count ?e) above efficiently from eg. some index metadata (if that's a thing) or will it have to essentially traverse the entire index to calculate the counts?


Additionally, I'm wondering if I should expect a call to qseq instead of q to perform better with the above query?


I expect your datoms version to be the fastest. The query versions will be retaining all records in memory at once. There’s no way in datomic to avoid visiting every item in that index. Datomic doesn’t have index metadata of eg cardinality info or set members


You need to cache more (larger object cache, valcache or memcached secondary cache) or a faster storage; the first query issuance fill be slow but subsequent ones will be faster (assuming the same peer performs the query each time). If that’s still not good enough, consider keeping the counts precomputed. You can perform the query then have something listen to txs using tx-report-queue or tx-range polling to keep the count up to date


All right, thanks for this.


Are :limit and :offset supported in all Datomic versions with index-pull? or were those params added in more recent versions? The index-pull doc string doesn't show those options, but this page demonstrates their use


I'm using on-prem, not client