Fork me on GitHub
#datomic
<
2022-11-02
>
stopa17:11:34

Hey team, question: how does clustering work in join queries for datomic? For example, say I want to do this:

[1 :posts ?pid]
[?pid :title ?title]
Afaik, datomic would do two index lookups: 1. EAV index [1 :posts] to find the set of ?pid 2. Implicit join with EAV index, which would look up N ?pid, and find the corresponding ?title My question is for 2. — how would the caching work? If there are N ?pid , we may end up fetching ~N different segments into memory. (Unless there is some kind of clustering)

favila17:11:40

Query generally prefers AEVT

favila17:11:49

Using the A if it’s known provides a kind of clustering/locality akin to what a column-oriented db would give

favila17:11:29

but yes, worst case, you could still have so many ?pid , spread over such a long time (so that their entity-ids are not at all contiguous) that you fetch nearly N segments

👍 1
favila17:11:47

partitions are a mechanism to control this

favila17:11:54

by enforcing a sort order

favila17:11:24

when you create an entity via a tempid, you can supply a partition; the partition id becomes the high bits of the entity id. By putting frequently-read-together entities into the same partition, you increase the chance that you will fetch significantly less than N segments for N items.

favila17:11:50

but this is not automatic, and you cannot alter an entity’s partition after creation.

stopa17:11:18

Really interesting, thank you @U09R86PA4!

stopa16:11:21

Curious question: Is there any database that solves this problem? Would love to learn how they approach it.

favila16:11:09

mature sql databases often allow you to partition rows according to some criteria. The point of this is to put like rows into the same physical storage silos. (It isn’t quite the same, but you can use it to solve the same kinds of problems)

favila16:11:14

There’s also often similar knobs on individual indexes.

stopa16:11:23

Gotcha, thank you @U09R86PA4!