Fork me on GitHub
#datomic
<
2023-02-11
>
niquola09:02:24

Hi, we want to partition datomic by patients- because most of queries will hit one patient - is there way to hint db to do this?

favila16:02:12

In on-prem only

favila16:02:47

Create a bunch of partition entities, eg 256

favila16:02:42

Assign each patient one randomly and evenly (eg with a hash function of something stable like a patient number)

favila16:02:25

Now any time you make a new entity, use d/tempid, use the partition belonging to the patient that the entity “belongs to” for partitioning purposes

favila16:02:28

It’s very much a “roll your own” system entirely in the application code. Datomic doesn’t care about partitioning at the data level; it’s just to get better segment locality at the storage level

niquola16:02:46

Thx, what is maximum realistic number of partitions?

favila16:02:45

The partition space is 20 bits, but that uses idents too

favila16:02:58

You really don’t need more than a few hundred

favila16:02:57

Partitions only help when your working set and object cache are significantly smaller than your entire db

niquola16:02:23

It is - let say i have 1-10M patients - each 100k medical records ~ 1M datoms

niquola17:02:32

~1M datoms

niquola17:02:29

My dream was - each patient is a partition, < 1sec cold start from S3 archived blocks to get pt datoms in memory

favila18:02:47

S3 archived?

favila18:02:18

Partitions are not hard boundaries, they just influence sort order to increase locality

favila18:02:27

1mil is not a lot of datoms

favila18:02:40

What is your oc hit rate?

favila18:02:55

For some perspective, we have 256*2 partitions on a >10 billion datom db, and we put a load balancer in front of peers that keys by partition

favila18:02:38

We have perf problems still, but it’s not from lack of partitions

favila18:02:53

And if you are on cloud this is all academic because it doesn’t let you control partitions (it doesn’t have d/tempid)