Fork me on GitHub
#datomic
<
2020-07-29
>
arohner12:07:13

Are there any recommendations on the size of a cardinalityMany attribute? Is there a problem with storing a million uuids in a single datom EAV?

stuarthalloway12:07:26

There is no particular limit, but you should keep in mind the memory implications of future use.

stuarthalloway12:07:41

For example, if you gradually build up 1 million EAs, and then retract the entire entity, the transaction that does the retraction will have 1 million datoms in it.

stuarthalloway12:07:56

Also consider pull expressions, which might have been written (or displayed in a UI) with the presumption that their results are smallish and don't need to be e.g. paginated.

stuarthalloway12:07:06

Programs consuming a high cardinality attribute may want to use https://docs.datomic.com/cloud/query/query-index-pull.html#aevt to consume in chunks.

souenzzo14:07:44

Reminder: pull by default get only 1000 elements on ref-to-many https://docs.datomic.com/on-prem/pull.html#limit-option

Nassin15:07:09

For example, if you gradually build up 1 million EAs, and then retract the entire entity, the transaction that does the retraction will have 1 million datoms in it.

Nassin15:07:22

Only if isComponent is true correct?

favila17:07:59

@U011VD1RDQT no, isComponent will propagate the delete to other entities

favila17:07:21

[E A 1millionV] is going to delete one million E datoms regardless of whether A is an isComponent attr.

Nassin17:07:04

ah true, was thinking it was of type :db.type/ref 👍

kschltz14:07:12

Hi there. My current cenario is that we're using datomic cloud in one of our major services and it is around 60M entities/3.5B datoms and some particular queries are under performatic. As we plan to grow some orders of magnitude, I was exploring alternatives to escalate both our writes and reads. From my understanding so far, given that I'm able to scale the number of processors to serve my dbs, and transactors dont compete for resources among those dbs, I started experimenting with the following: 1 Have my service write in parallel to multiple dbs (let's say db0 db1 db2 all with the same schema), ensuring that the same entity always end up in the correct db so I don't end up with partial data split across my databases 2 When querying, I issue them in parallel, then merge the results in my application, something like

(pcalls query-for-satellites0 query-for-satellites1 query-for-satellites2)
So far, this parallel read/write cenario has proven to be really performatic Now my question to you guys is if I'm missing on something, or are there any achitectural gotchas that would make this a bad idea?

marciol19:07:24

@schultzkaue It’d be nice to know if anyone already experimented this kind of topology. You are sharding your data across several db’s and writing/issuing queries in parallel right?