This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2021-11-15
Channels
- # announcements (1)
- # asami (29)
- # babashka (31)
- # beginners (48)
- # calva (39)
- # cljsrn (4)
- # clojure (56)
- # clojure-dev (51)
- # clojure-doc (3)
- # clojure-europe (40)
- # clojure-gamedev (13)
- # clojure-italy (22)
- # clojure-nl (3)
- # clojure-uk (5)
- # cursive (9)
- # datomic (184)
- # events (7)
- # fulcro (8)
- # graalvm (2)
- # jobs (1)
- # malli (6)
- # meander (1)
- # nrepl (5)
- # off-topic (10)
- # pathom (9)
- # polylith (33)
- # portal (2)
- # re-frame (7)
- # reagent (12)
- # releases (3)
- # remote-jobs (3)
- # reveal (27)
- # shadow-cljs (34)
- # sql (1)
- # vim (7)
- # xtdb (62)
Hey! I'm using integrant in my app on ion. At the moment, as in ion-event-example, I call d/connect only in transactions. My integrant init-key returns (partial get-connection config)
. Based on this, do we open the connection only when called? Maybe someone made a similar integrant component. What is the best way to do this?
Here's my code
(def get-client
(memoize
(fn [config]
(d/client
(cond
(:dev? config) {:server-type :dev-local
:storage-dir :mem
:system "dev"}
:default {:server-type :ion
:region (utils/get-param "region")
:system (utils/get-param "system-name")
:endpoint (utils/get-param "endpoint")})))))
(defn get-connection
"Get shared connection."
[config]
(utils/with-retry #(d/connect (get-client config) {:db-name (utils/get-param "db-name")})))
Not sure if it’s the best solution, but I’ve made a duct / integrant module for this. https://github.com/hden/duct.module.datomic
Would it be a fair expectation that a single value returning query aggregate would be more performant than pulling the same data & aggregating client side?
& yeah, that's what I thought. Here's some stats for a test of 50 runs with each approach: • aggregate: 2772ms avg, SD 718, min 2150, max 6415 • pull: 2317ms avg, SD 782, min 1807, max 5682
This is over a fairly large dataset. For some reason, pulling & sending thousands of maps over the wire is quicker than aggregating on the db.
The query engine itself doesn’t have any aggregation smarts: it realizes a whole result set before it aggregates, and sometimes memory pressure can cause that to be slow. This is why index-pull or d/datoms with an incremental aggregator can sometimes be faster. On a peer, it can very often be much faster.
But on client-api especially, the smaller the pipe between the client and peer, the more you want to push the aggregation into the peer.
Caveat: I am attempting to aggregate 13 optional values. Maybe it doesn't like that many aggregates?
Query looks like this, where the find var & where clause is repeated 12 more times for each value I aggregate
{:find [?services
(sum ?v)],
:where [[(get-else $ ?e my-attr 0.0) ?v]],,
:with [?r]}
Better comparison (first option has 12 additional clauses in where & 12 more find vars)
{:find [?services (sum ?v)],
:where [[?r :x ?e]
[(get-else $ ?e :my-attr 0.0) ?v]],,
:with [?r]}
{:find [(pull ?r [:services
{:x [*]}])]
:where [[?r :x ?e]]}
e.g.,
{:find [?services (sum ?v) (sum ?2) ... (sum ?vN)],
:where [[?r :x ?e]
[(get-else $ ?e :my-attr 0.0) ?v]
[(get-else $ ?e :my-attr2 0.0) ?v2]
...
[(get-else $ ?e :my-attrN 0.0) ?vN]
]
:with [?r]}
I would say the result set after :where
is going to be much bigger for the aggregation one
Right. Why is there dupes at the end of https://clojurians.slack.com/archives/C03RZMDSH/p1637009275327000?thread_ts=1637006953.319600&channel=C03RZMDSH&message_ts=1637009275.327000
I see what you mean. I was trying to express: for a ?services, sum up all these values.
So, if I understand you correctly, I should compare the result set size of:
{:find [?service ?v ?v2 ... ?vN],
:where [[?r :service ?service]
[?r :x ?e]
[(get-else $ ?e :my-attr 0.0) ?v]
[(get-else $ ?e :my-attr2 0.0) ?v2]
...
[(get-else $ ?e :my-attrN 0.0) ?vN]]
:with [?r]}
to
{:find [(pull ?r [:service
{:x [*]}])]
:where [[?r :service ?service]]}
are your benchmark numbers including whatever aggregation code you rolled yourself for the pull?
and the thing I was curious about is how this performs comparatively:
(->> attrs
(mapv #(d/q '{:find [?service ?attr (sum ?v)]
:with [?r]
:in [$ ?attr]
:where [[?r :service ?service]
[?r :x ?e]
[(get-else $ ?e ?attr 0.0) ?v]]
:with [?r]}
db %))
(reduce (fn [aggs [service attr s]]
(assoc-in aggs [service attr] s))
{}))
> you don’t even need ?service, it may be dropping it What do you mean? I need the sums by service
Ok so stepping back. Do we know why the other was so slow? It seems like it an equivalent query.
Well, this is extremely exciting. 2-3s -> 0.5s is a huge win for this query. Thank you for taking the time to help.
(->> attrs
(d/q '{:find [?service ?attr (sum ?v)]
:with [?r]
:in [$ [?attr ...]]
:where [[?r :service ?service]
[?r :x ?e]
[(get-else $ ?e ?attr 0.0) ?v]]
:with [?r]}
db )
(reduce (fn [aggs [service attr s]]
(assoc-in aggs [service attr] s))
{}))
The output is difference btw. The input version returns one map for all ?attr passed in. The non-input returns one map for all ?attr the db is actually using
Non-input:
{:query '{:find [?services ?attr-k (sum ?value)],
:where [[?r :cs.model.monitored-resource/service ?services]
[?r :cs.model.monitored.cloud-acct/cloud-account ?cloud-acct]
[?cloud-acct :cs.model.monitored.cloud-acct/mode :cs.model.monitored.cloud-acct/mode-fetch-all]
[?r :cs.model.value-breakdown/orig-value-breakdown ?orig-value-breakdown]
[?orig-value-breakdown ?attr ?value]
[?attr :db/ident ?attr-k]]
:keys [service
vb-k
sum],
:with [?r]},
:args (list db),
:timeout 60000}
Input
{:find [?services ?attr-k (sum ?value)],
:in [$ [?attr ...]]
:where [[?r :cs.model.monitored-resource/service ?services]
[?r :cs.model.monitored.cloud-acct/cloud-account ?cloud-acct]
[?cloud-acct :cs.model.monitored.cloud-acct/mode :cs.model.monitored.cloud-acct/mode-fetch-all]
[?r :cs.model.value-breakdown/orig-value-breakdown ?orig-value-breakdown]
[(get-else $ ?orig-value-breakdown ?attr 0.0) ?value]
[?attr :db/ident ?attr-k]]
:keys [service
vb-k
sum],
:with [?r]}
there is no get-else to turn those missing items into 0, they will just be missing from the output
Geez, yeah I don't think I can use [?orig-value-breakdown ?attr ?value]
. Nothing guarantees a non-numeric key won't get added to ?orig-value-breakdown
So now we're
;; non-input
{:find [?services ?attr-k (sum ?value)],
:where [[?r :cs.model.monitored-resource/service ?services]
[?r :cs.model.monitored.cloud-acct/cloud-account ?cloud-acct]
[?cloud-acct :cs.model.monitored.cloud-acct/mode :cs.model.monitored.cloud-acct/mode-fetch-all]
[?r :cs.model.value-breakdown/orig-value-breakdown ?orig-value-breakdown]
[?orig-value-breakdown ?attr ?value]
[?attr :db/ident ?attr-k]]
:keys [service
vb-k
sum],
:with [?r]}
;; input
{:find [?services ?attr-k (sum ?value)],
:in [$ [?attr ...]]
:where [[?r :cs.model.monitored-resource/service ?services]
[?r :cs.model.monitored.cloud-acct/cloud-account ?cloud-acct]
[?cloud-acct :cs.model.monitored.cloud-acct/mode :cs.model.monitored.cloud-acct/mode-fetch-all]
[?r :cs.model.value-breakdown/orig-value-breakdown ?orig-value-breakdown]
[?orig-value-breakdown ?attr ?value]
[?attr :db/ident ?attr-k]]
:keys [service
vb-k
sum],
:with [?r]}
The part I don’t get is why this
(mapv #(d/q {:find [?services ?attr-k (sum ?value)],
:in [$ ?attr]
:where [[?r :cs.model.monitored-resource/service ?services]
[?r :cs.model.monitored.cloud-acct/cloud-account ?cloud-acct]
[?cloud-acct :cs.model.monitored.cloud-acct/mode :cs.model.monitored.cloud-acct/mode-fetch-all]
[?r :cs.model.value-breakdown/orig-value-breakdown ?orig-value-breakdown]
[?orig-value-breakdown ?attr ?value]
[?attr :db/ident ?attr-k]]
:keys [service
vb-k
sum],
:with [?r]} db %) attr-vec)
would be 0.5 seconds total, but this:
(d/q {:find [?services ?attr-k (sum ?value)],
:in [$ [?attr ...]]
:where [[?r :cs.model.monitored-resource/service ?services]
[?r :cs.model.monitored.cloud-acct/cloud-account ?cloud-acct]
[?cloud-acct :cs.model.monitored.cloud-acct/mode :cs.model.monitored.cloud-acct/mode-fetch-all]
[?r :cs.model.value-breakdown/orig-value-breakdown ?orig-value-breakdown]
[?orig-value-breakdown ?attr ?value]
[?attr :db/ident ?attr-k]]
:keys [service
vb-k
sum],
:with [?r]} db attr-vec)
would be over 2 secondsLol
{:find [?services ?attr-k (sum ?value)],
:in [$ ?my-attr-set]
:where [[?r :cs.model.monitored-resource/service ?services]
[?r :cs.model.monitored.cloud-acct/cloud-account ?cloud-acct]
[?cloud-acct :cs.model.monitored.cloud-acct/mode :cs.model.monitored.cloud-acct/mode-fetch-all]
[?r :cs.model.value-breakdown/orig-value-breakdown ?orig-value-breakdown]
[?orig-value-breakdown ?attr ?value]
[?attr :db/ident ?attr-k]
[(contains? ?my-attr-set ?attr-k)]]
:keys [service vb-k sum],
:with [?r]}
For whatever reason, EAVT on ?orig-value-breakdown
is cheaper than 12 AEVT on ?attr ?orig-value-breakdown
(assuming these queries are representative)
I’m just getting confused by the edits and revisions, pseudo and non-pseudo queries, not sure which ones correspond to which timing anymore
Understandable. You should see the size of the comment block I've got now 😅 Here's a summary: Query 1 • 561ms avg • Fastest option • Downside is ?value is not guaranteed to be numeric.
{:find [?services ?attr-k (sum ?value)],
:where [[?r :cs.model.monitored-resource/service ?services]
[?r :cs.model.monitored.cloud-acct/cloud-account ?cloud-acct]
[?cloud-acct :cs.model.monitored.cloud-acct/mode :cs.model.monitored.cloud-acct/mode-fetch-all]
[?r :cs.model.value-breakdown/orig-value-breakdown ?orig-value-breakdown]
[?orig-value-breakdown ?attr ?value]
[?attr :db/ident ?attr-k]]
:keys [service vb-k sum],
:with [?r]}
Query 2
• 2112ms avg
• Idiomatic alternative to 1 with severe perf impact.
{:find [?services ?attr-k (sum ?value)],
:in [$ [?attr ...]]
:where [[?r :cs.model.monitored-resource/service ?services]
[?r :cs.model.monitored.cloud-acct/cloud-account ?cloud-acct]
[?cloud-acct :cs.model.monitored.cloud-acct/mode :cs.model.monitored.cloud-acct/mode-fetch-all]
[?r :cs.model.value-breakdown/orig-value-breakdown ?orig-value-breakdown]
[?orig-value-breakdown ?attr ?value]
[?attr :db/ident ?attr-k]]
:keys [service vb-k sum],
:with [?r]}
Query 3
• 628ms avg
• Hack around 2 to ensure ?value is sum
able.
{:find [?services ?attr-k (sum ?value)],
:in [$ ?my-attr-set]
:where [[?r :cs.model.monitored-resource/service ?services]
[?r :cs.model.monitored.cloud-acct/cloud-account ?cloud-acct]
[?cloud-acct :cs.model.monitored.cloud-acct/mode :cs.model.monitored.cloud-acct/mode-fetch-all]
[?r :cs.model.value-breakdown/orig-value-breakdown ?orig-value-breakdown]
[?orig-value-breakdown ?attr ?value]
[?attr :db/ident ?attr-k]
[(contains? ?my-attr-set ?attr-k)]]
:keys [service vb-k sum],
:with [?r]}
This makes it look like my original supposition was false: intermediate result set size wasn’t the problem. It really looks like index choice is what matters.
What is the total time if you run Query 1 12 times with ?attr
as an input parameter (a different attr each run)
I think your original supposition was right -- I didn't include the original query because the above are so much cleaner and I forgot about it 🙂 . In these cases, I think you're right again on index choice. I'll try that out.
Tried both literal and input. Input • 473ms avg
{:find [?services (sum ?value)],
:in [$ ?attr]
:where [[?r :cs.model.monitored-resource/service ?services]
[?r :cs.model.monitored.cloud-acct/cloud-account ?cloud-acct]
[?cloud-acct :cs.model.monitored.cloud-acct/mode :cs.model.monitored.cloud-acct/mode-fetch-all]
[?r :cs.model.value-breakdown/orig-value-breakdown ?orig-value-breakdown]
[?orig-value-breakdown ?attr ?value]]
:keys [service sum],
:with [?r]}
Literal
• 458ms avg
{:find [?services (sum ?value)],
:in [$]
:where [[?r :cs.model.monitored-resource/service ?services]
[?r :cs.model.monitored.cloud-acct/cloud-account ?cloud-acct]
[?cloud-acct :cs.model.monitored.cloud-acct/mode :cs.model.monitored.cloud-acct/mode-fetch-all]
[?r :cs.model.value-breakdown/orig-value-breakdown ?orig-value-breakdown]
[?orig-value-breakdown :cs.model.value-breakdown/provider-cost ?value]]
:keys [service sum],
:with [?r]}
Concurrent • Code looks similar to the below, just ran 50 tests & took avg. • 2573ms avg
(def futs-f
(fn []
(mapv
(fn [vb-k]
(let [qmap {:query '{:find [?services (sum ?value)],
:in [$ ?attr]
:where [[?r :cs.model.monitored-resource/service ?services]
[?r :cs.model.monitored.cloud-acct/cloud-account ?cloud-acct]
[?cloud-acct :cs.model.monitored.cloud-acct/mode :cs.model.monitored.cloud-acct/mode-fetch-all]
[?r :cs.model.value-breakdown/orig-value-breakdown ?orig-value-breakdown]
[?orig-value-breakdown ?attr ?value]]
:keys [service sum],
:with [?r]},
:args [db vb-k],
:timeout 60000}
results (manifold.deferred/future (dbu/q qmap))]
results))
cs.model.value-breakdown/value-breakdown-ks)))
(time (mapv deref (futs-f)))
This strongly suggests that the object cache is simply not large enough to hold the AEVT indexes involved, and it has to swap them in and out as it runs
Hmm. To be sure I'm following, we're talking about the internal Datomic call for the AEVT index component {:components [?attr ?orig-value-breakdown]}
?
[?orig-value-breakdown ?attr ?value]
This one. If ?attr
is bound, query prefers AEVT; ?orig-value-breakdown
is bound but ?attr
is not, then EAVT. That’s really the only difference between the fast and slow approaches
Wait, let me back up. The query for 1 attr https://clojurians.slack.com/archives/C03RZMDSH/p1637077448348900?thread_ts=1637006953.319600&cid=C03RZMDSH took under 500ms. Running all 12 concurrently took 2573ms on avg. So you're saying that the concurrent one took longer b/c "the object cache is simply not large enough to hold the AEVT indexes involved, and it has to swap them in and out as it runs" ?
Got it. Very insightful. Our concurrent one is slightly worse than letting Datomic handle that. Ok so the theory on the perf difference is that the query that uses the AEVT index is much slower due to a too small object cache.
yeah, to verify this you should observe the object cache hit rate directly, or storage gets, or even just aggregate network
if this is right, then the EAVT query will have a higher hit rate, lower storage get, and lower network activity as it runs
I see. Ok. So you're supposing that the cache is large enough & warm enough to hold EAVT index but not the AEVT?
Are there any other query loads on this instance? it could be they are using EAVT of these indexes already (e.g. via pull).
so maybe they are already loaded, and when evicted to make more space they are quickly re-loaded
No, just me testing these scenarios. Certainly one of the most common queries executed in this group would be a pull.
I’m not sure how it works on cloud. On peer, by default half the heap space is reserved for object cache, and this is tuneable
> if this is right, then the EAVT query will have a higher hit rate, lower storage get, and lower network activity as it runs This instance is running with ssd valcache so perhaps no network changes?
Because you'd expect valcache to make up the difference with object cache not holding the aevt index?
it definitely wouldn’t make up the difference completely. A heap pointer is always going to be much much faster than IO+decompression+deserialization
I thought so 🙂 & I think it should have no problem loading whatever it needs given the ssd is 0.5tb. Total system is a little over 900m datoms, but this db we've been testing on is some fraction of that.
Booted up a brand new instance to make sure we've got a good starting point. Calling d/connect on the db we've been working with results in 14.6m bytes written.
I started with Query 3, the most likely candidate I will use later. The first run of the query took 3583ms. The second run took 1445ms. We now have a second spike in our graph that happened when I ran query 3 for the first time. It wrote 21.6m bytes.
Now I ran the Concurrent query. The first run took 6074ms. The second run took 4343ms. And our 3rd disk write appears. It wrote 20.9mb.
Are you sure this is the valcache drive? None of these seem to have reads. Do you not have access direct oc, valcache, and storage get/hit rate metrics?
Good point. I am not 100% certain - will double check aws docs. Perhaps it's simply not using valcache for some reason.
But these metrics should be published directly, we shouldn’t have to infer from disk activity
This is the full dashboard Datomic provides for a query group (plus my manual disk chart)
Actually, I think it's highly likely that disk chart is the valcache drive since we are seeing disk writes that directly align with when the queries happen.
From https://docs.datomic.com/cloud/whatis/architecture.html#caching, it seems Cloud's cache order is: 1) object cache 2) valcache 3) efs 4) s3 fallback
I sshed into the node to make sure we aren't going crazy. From df -h:
Filesystem Size Used Avail Use% Mounted on
devtmpfs 7.5G 0 7.5G 0% /dev
tmpfs 7.5G 0 7.5G 0% /dev/shm
tmpfs 7.5G 496K 7.5G 1% /run
tmpfs 7.5G 0 7.5G 0% /sys/fs/cgroup
/dev/xvda1 8.0G 2.5G 5.6G 31% /
8.0E 173G 8.0E 1% /opt/datomic/efs-mount
/dev/nvme0n1 436G 102M 414G 1% /opt/ssd1
tmpfs 1.5G 0 1.5G 0% /run/user/4242
tmpfs 1.5G 0 1.5G 0% /run/user/1000
Very interesting results I might say. Datomic's CF template attaches an EBS gp2 (non-ssd) drive to the nodes automatically. It surprises me that the drive has 2.5gb used. The nvme drive does have 102m written though.
Ok, it's definitely written to the ssd drive.
tree -da /opt/ssd1/
/opt/ssd1/
├── datomic
│ └── valcache
│ ├── 000
│ ├── 001
│ ├── 002
Just need info on what the Cache Hit Ratios chart mean. There are 2 paths: 1. Cache Hit Ratios comprises any Datomic cache type. So those points we've seen could mean that Datomic is reaching for valcache in our test queries. 2. Cache Hit Ratios really only includes data for the EFS cache. If so, why is it reaching for EFS when it likely has the data available in valcache? Both of these would lead to more questions and still leave us with a mystery of why one query is slower than the other.
This has been a very enlightening discussion. I believe we are well into the area of opening a support ticket 🙂
One observation from our previous discussion. We had previously agreed on this: > Ok so the theory on the perf difference is that the query that uses the AEVT index is much slower due to a too small object cache. However, that doesn't seem to hold. I would think that once you load the AEVT index into the object cache subsequent queries using that index would be quick.