This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2022-03-31
Channels
- # announcements (20)
- # asami (14)
- # aws (6)
- # babashka (15)
- # beginners (83)
- # biff (6)
- # calva (93)
- # cider (3)
- # clj-kondo (21)
- # cljdoc (106)
- # cljs-dev (32)
- # clojure (165)
- # clojure-dev (78)
- # clojure-europe (54)
- # clojure-italy (9)
- # clojure-nl (9)
- # clojure-norway (24)
- # clojure-uk (4)
- # clojurescript (6)
- # community-development (2)
- # conjure (2)
- # core-typed (14)
- # datahike (4)
- # datomic (2)
- # emacs (40)
- # events (1)
- # fulcro (11)
- # graalvm-mobile (29)
- # graphql (8)
- # honeysql (19)
- # java (1)
- # jobs (1)
- # lsp (232)
- # malli (5)
- # membrane (112)
- # nextjournal (11)
- # off-topic (63)
- # portal (12)
- # re-frame (6)
- # reagent (3)
- # reitit (4)
- # rewrite-clj (2)
- # shadow-cljs (25)
- # tools-deps (6)
Weekly OSS meeting: https://meet.jit.si/MassiveVoicesRuleUnfortunately
So: I noticed that my pull performance isn't all that great (with the file backend on my Mac). I have a pull-many
call with 9 entities and 8 attributes (including one join) and it takes 40ms
I pull multiple times to fulfil a request, so the times are stacking up, reaching 500ms for a request. >1s on my server without SSD.
Anyway, I looked into it and noticed that pulling by lookup-ref is noticeable slower compared to pulling by eid.
pId nCalls Min 50% ≤ 90% ≤ 95% ≤ 99% ≤ Max Mean MAD Clock Total
:pull-with-lookup 1,000 681,47μs 707,46μs 787,53μs 846,66μs 928,76μs 1,90ms 730,65μs ±5% 730,65ms 50%
:lookup->eid 1,000 431,54μs 447,78μs 495,12μs 519,19μs 592,17μs 1,31ms 460,24μs ±4% 460,24ms 32%
:pull-with-eid 1,000 246,34μs 256,51μs 283,64μs 306,21μs 342,48μs 456,91μs 264,05μs ±5% 264,05ms 18%
My thoughts about that:
It would be great if datahike would keep (parts of) the avet
index in memory.
(profile {}
(let [db @conn
eid-cache (atom {})
lookup->eid (fn [db lookup] (:e (first (d/datoms db {:index :avet, :components lookup}))))
pull-with-eid-cache
(fn pull-with-eid-cache [db selector eid]
(d/pull db selector
(if (integer? eid)
eid
(if-let [cached-eid (get-in @eid-cache eid)]
cached-eid
(let [e (lookup->eid db eid)]
(swap! eid-cache assoc-in eid e)
e)))))]
(doseq [_ (range 1000)]
(p :lookup->eid
(lookup->eid db [:decide.models.proposal/id #uuid"6051a9e7-5c78-46b4-90e7-4492c89f4728"]))
(p :pull-with-lookup
(d/pull db ['*] [:decide.models.proposal/id #uuid"6051a9e7-5c78-46b4-90e7-4492c89f4728"]))
(p :pull-with-lookup->eid-cache
(pull-with-eid-cache db ['*] [:decide.models.proposal/id #uuid"6051a9e7-5c78-46b4-90e7-4492c89f4728"]))
(p :pull-with-eid
(d/pull db ['*] 156)))))
pId nCalls Min 50% ≤ 90% ≤ 95% ≤ 99% ≤ Max Mean MAD Clock Total
:pull-with-lookup 1,000 691,83μs 722,40μs 873,85μs 911,35μs 1,07ms 1,76ms 764,55μs ±8% 764,55ms 42%
:lookup->eid 1,000 434,32μs 454,18μs 545,43μs 569,15μs 671,98μs 1,58ms 479,82μs ±8% 479,82ms 26%
:pull-with-lookup->eid-cache 1,000 257,03μs 272,01μs 329,56μs 354,27μs 455,49μs 1,25ms 290,05μs ±10% 290,05ms 16%
:pull-with-eid 1,000 250,20μs 260,28μs 302,66μs 327,57μs 407,67μs 1,39ms 274,43μs ±8% 274,43ms 15%
Accounted 1,81s 100%
Clock 1,82s 100%
With clojure.core.cache
(require '[clojure.core.cache.wrapped :as cw]
(profile {}
(let [db @conn
*eid-cache (cw/soft-cache-factory {})
lookup->eid (fn [db lookup] (:e (first (d/datoms db {:index :avet, :components lookup}))))
pull-with-eid-cache
(fn pull-with-eid-cache [db selector eid]
(d/pull db selector
(if (integer? eid)
eid
(cw/lookup-or-miss *eid-cache eid #(lookup->eid db %))]
(doseq [_ (range 1000)]
(p :lookup->eid
(lookup->eid db [:decide.models.proposal/id #uuid"6051a9e7-5c78-46b4-90e7-4492c89f4728"]))
(p :pull-with-lookup
(d/pull db ['*] [:decide.models.proposal/id #uuid"6051a9e7-5c78-46b4-90e7-4492c89f4728"]))
(p :pull-with-lookup->eid-cache
(pull-with-eid-cache db ['*] [:decide.models.proposal/id #uuid"6051a9e7-5c78-46b4-90e7-4492c89f4728"]))
(p :pull-with-eid
(d/pull db ['*] 156)))))
Yeah, that would be helpful indeed, currently we only have some caching strategies for the queries. The cache should be optional since we want Datahike to be able to also run on smaller systems. A question would be where to add the cache and how this relates to the query cache. Maybe you could add this to a discussion on github and we could see and discuss how we could add that to Datahike.