Fork me on GitHub
#datomic
<
2023-08-31
>
stuartrexking17:08:49

With Datomic Pro, would a call to (d/datoms db {:index index}) for all four indexes warm the object cache? Assume the object cache configuration can hold the entire dataset

ghadi17:08:22

no, it will do nothing until you consume the result of d/datoms (which is a lazy/iterable)

ghadi17:08:47

btw the syntax for that call in Pro is different than client/Cloud

stuartrexking17:08:49

Thanks @U050ECB92. I’m reducing like this which should consume.

(reduce (fn [_ _]) (d/datoms db index))

stuartrexking17:08:57

Where index is :eavt, etc

ghadi17:08:17

what are you trying to do?

stuartrexking17:08:24

I’m trying to warm the object cache.

stuartrexking17:08:47

The current strategy is consume the indexes.

stuartrexking17:08:13

I have a slow join query that’s calling out DynamoDBv2.GetItem many times after a cold start. I’m hoping I can improve it by warming the object cache. Maybe I’m misunderstanding something.

stuartrexking17:08:40

I’m assuming that consuming each of the indexes will warm the object cache.

stuartrexking17:08:24

And will improve the query

stuartrexking17:08:09

@U050ECB92 I’ve read the docs but it’s unclear if it’s possible to warm the caches as I’m assuming it is. Any pointers or direction on how I can better understand what’s happening with the caches when I consume the indexes fully will be greatly appreciated 🙏

ghadi17:08:09

Hey Stuart I’ll try to help async. Post the intent of your query, the query itself, and also check out query stats (on doc site, link not handy atm)

stuartrexking17:08:24

Thanks. I’ll put it together.

ghadi19:08:55

Queries shouldn’t be hitting dynamo, even when a process is new/cold

ghadi19:08:02

I would set aside your previous hack attempt and focus on the query itself. Using io-stats and query stats

stuartrexking19:09:29

I ran the query with io-stats. On first call it’s loading from valcache which takes 44 seconds. Second call reads from object cache and takes 5 seconds. This version of the query returns the most of any so it’s the worst case. How does valcache perform relative to object cache?

stuartrexking19:09:18

Both io-stats and query-stats very useful so 🙏:skin-tone-2:

ghadi19:09:18

What did query stats reveal? Want to post the output? (Redact if necessary)

stuartrexking19:09:21

Query seemed fine. I’ll post redacted shortly.

stuartrexking03:09:04

@U050ECB92 Here are the query stats. This is the broadest possible query, which returns nearly all the data. This takes 44 seconds for first query, reading from valcache, 5 for subsequent queries reading from object cache only.

{:query [:find
         ?product-id
         ?result-id
         :in
         $
         [?product-id ...]
         :where
         [?product :company/id ?product-id]
         [?product :company.product.derived/searchable? true]
         [(get-else $ ?product :company.product.sibling-variant/group ?product) ?result]
         [?result :company/id ?result-id]],
 :phases [{:sched (([(ground $__in__2) [?product-id ...]]
                    [?product :company/id ?product-id]
                    [?product :company.product.derived/searchable? true]
                    [(get-else $ ?product :company.product.sibling-variant/group ?product) ?result]
                    [?result :company/id ?result-id])),
           :clauses [{:clause [(ground $__in__2) [?product-id ...]],
                      :rows-in 0,
                      :rows-out 246102,
                      :binds-in (),
                      :binds-out [?product-id],
                      :expansion 246102}
                     {:clause [?product :company/id ?product-id],
                      :rows-in 246102,
                      :rows-out 246102,
                      :binds-in [?product-id],
                      :binds-out [?product-id ?product]}
                     {:clause [?product :company.product.derived/searchable? true],
                      :rows-in 246102,
                      :rows-out 236634,
                      :binds-in [?product-id ?product],
                      :binds-out [?product-id ?product]}
                     {:clause [(get-else $ ?product :company.product.sibling-variant/group ?product) ?result],
                      :rows-in 236634,
                      :rows-out 236634,
                      :binds-in [?product-id ?product],
                      :binds-out [?product-id ?result]}
                     {:clause [?result :company/id ?result-id],
                      :rows-in 236634,
                      :rows-out 236634,
                      :binds-in [?product-id ?result],
                      :binds-out [?product-id ?result-id]}]}]}

gvind22:09:29

Curious is you were able to take a look on this one @U050ECB92?

ghadi22:09:37

weird I missed the notification

ghadi22:09:56

I’ll check it out in a few

gvind18:09:27

Sorry to ping @U050ECB92. Figured it might have disappeared again. Any help much appreciated!

stuartrexking21:09:26

Pinging @U050ECB92 again on this

ghadi15:09:20

this query grabs 250K entities... this is a lot. Are they all next to each other in the indexes? (as in, are the entities of interest interleaved in the index with other entities?). To understand this, run this query with https://docs.datomic.com/pro/api/io-stats.html to see how much storage reads its making, and where. Run it multiple times because caches get populated. Run a query for simply [?product :company.product.derived/searchable? true] , which are the entities of interest. That clause is strictly more selective than the first clause, so you can swap their order. Try to remove any "cache warming" tricks you have, because those are likely not helping. You may want to usehttps://blog.datomic.com/2023/04/implicit-partitions.htmlto improve the locality of entities in the index. If your searchable products are interleaved with non-searchable ones (based on the order of their transactions), the density of those products per storage read will be low.