This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2022-11-02
Channels
- # announcements (2)
- # babashka (10)
- # beginners (61)
- # calva (17)
- # cider (27)
- # clj-kondo (14)
- # clojure (230)
- # clojure-austin (4)
- # clojure-europe (17)
- # clojure-france (6)
- # clojure-hungary (3)
- # clojure-norway (30)
- # clojure-sweden (9)
- # clojure-uk (2)
- # clojurescript (58)
- # conjure (11)
- # core-async (7)
- # cursive (74)
- # datalog (2)
- # datomic (15)
- # events (8)
- # figwheel-main (5)
- # fulcro (2)
- # graalvm (23)
- # graphql (2)
- # helix (17)
- # humbleui (2)
- # jobs (2)
- # kaocha (6)
- # lsp (19)
- # malli (7)
- # nbb (51)
- # off-topic (33)
- # pathom (26)
- # pedestal (2)
- # polylith (1)
- # portal (4)
- # re-frame (17)
- # react (3)
- # reitit (5)
- # releases (2)
- # remote-jobs (2)
- # shadow-cljs (18)
- # sql (65)
- # tools-deps (8)
- # xtdb (28)
Hey team, question: how does clustering work in join queries for datomic? For example, say I want to do this:
[1 :posts ?pid]
[?pid :title ?title]
Afaik, datomic would do two index lookups:
1. EAV index [1 :posts]
to find the set of ?pid
2. Implicit join with EAV index, which would look up N ?pid
, and find the corresponding ?title
My question is for 2.
— how would the caching work? If there are N ?pid
, we may end up fetching ~N different segments into memory. (Unless there is some kind of clustering)Using the A if it’s known provides a kind of clustering/locality akin to what a column-oriented db would give
but yes, worst case, you could still have so many ?pid
, spread over such a long time (so that their entity-ids are not at all contiguous) that you fetch nearly N segments
when you create an entity via a tempid, you can supply a partition; the partition id becomes the high bits of the entity id. By putting frequently-read-together entities into the same partition, you increase the chance that you will fetch significantly less than N segments for N items.
Really interesting, thank you @U09R86PA4!
Curious question: Is there any database that solves this problem? Would love to learn how they approach it.
mature sql databases often allow you to partition rows according to some criteria. The point of this is to put like rows into the same physical storage silos. (It isn’t quite the same, but you can use it to solve the same kinds of problems)
Gotcha, thank you @U09R86PA4!