datomic 2016-02-06 | Slack Archive

@currentoor: there’s also the :with clause in Datalog query

btw, the first datalog pattern in your timestamps query [?eid _ _ ?tx _] is made redundant by the second

@bkamphaus: what is the maximum size a Datomic database can reach? i vaguely remember Stu either talking about or writing about this somewhere but i can’t find it. i know 1 billion datoms is possible. what’s the total ‘address space’?

tcrayford11:02:01

@robert-stuttaford: ~10 billion datoms is the problem point. Not an address space thing, but problematic

tcrayford11:02:39

@robert-stuttaford: also note that you can have at most ~20k idents in the db, because every ident is in memory in every peer/transactor

robert-stuttaford12:02:43

thanks @tcrayford ! what makes 10b datoms a problem? can you direct me to something to read or watch?

Ben Kamphaus14:02:40

@robert-stuttaford: Stu's answer on this thread elaborates a little more: https://groups.google.com/forum/m/#!topic/datomic/iZHvQfamirI -- it's a practical limit and the value is a rough rule of thumb. the database still functions, but probably not with acceptable performance characteristics especially if the transaction volume would reach that size limit quickly for any given use case.

robert-stuttaford16:02:22

thanks ben!

robert-stuttaford16:02:27

super valuable info

meow16:02:12

What is an ident of which there can be at most 20k? I'd like to understand this limit.

Ben Kamphaus17:02:11

the in-memory aspect of idents is documented here: http://docs.datomic.com/identity.html#idents

meow17:02:14

@bkamphaus: Thank you for that link.

meow17:02:47

So is it fair to say that the ident limitation is primarily felt with more complex schemas?

meow17:02:59

If so, what is the impact of schema evolution?

Ben Kamphaus17:02:43

@meow: I’m not familiar with anyone running up against practical limits with ident count, though I imagine it would have an impact if you had e.g. generated or flexible tagging that users provided (if you anticipated thousands and thousands of that sort of tag, I would say switch to a unique/identity keyword or string attribute of your own.

Ben Kamphaus17:02:59

there’s also a limit on on schema elements but it’s pretty high, 2^20 http://docs.datomic.com/schema.html#schema-limits

meow17:02:53

Braid has open-ended tagging of conversations.

meow17:02:07

We will hit those limits.

meow17:02:44

Is there a performance penalty to the unique/identity keyword or string attribute of our own.

meow17:02:20

And can you address the impact of schema evolution?

Ben Kamphaus17:02:50

ident is more performant but carries more memory overhead (pre-loaded). With your own unique attr on ref’d entity vs. ident you pay cost for retrieving segments and require warm cache etc. (three rough orders of magnitude to get segment from storage, memcached, object cache).

meow17:02:34

That is unfortunate.

Ben Kamphaus17:02:03

if by schema evolution you mean how to make the change, you can find every one of those enums and give it an identical attr/val keyword name for what the ident was, leave the entity intact.

Ben Kamphaus17:02:24

but obviously pull, query, etc. and automagic around identy/eid translation is lost and requires more verbose lookup ref.

meow17:02:34

By schema evolution I mean the addition and/or removal of enitity attributes over time as the database design changes in a production environment along with the issues of migration of existing entities and how that works in datomic given that it is immutable.

Ben Kamphaus17:02:08

I want to double check on that 20k limit, not sure if calculated or from a rule of thumb Stu or someone provided i.e. on a video. I do know that we caution people against too many idents but I’m not familiar with that specific boundary, @tcrayford if you don’t mind my quick follow question, can you refer me to the source for the 20k ident limit?

Ben Kamphaus17:02:52

@meow: not immutable over time, i.e. you can retract idents, assert them on other attributes, etc. But for testing, staging, etc. a lot of times you’re using the database itself as a test then migrating the portion of the schema/data you prefer to keep.

meow17:02:45

We always migrate the production instance of Braid.

meow17:02:11

We have the full history.

Ben Kamphaus17:02:23

the “present” database t/snapshot is the efficient one I mean, as in: http://docs.datomic.com/filters.html#usage-considerations

Ben Kamphaus17:02:41

“queries about "now" are as efficient as possible–they do not consider history and pay no penalty for history, no matter how much history is stored in the system."

meow17:02:46

What schema is used when I query for something that happened yesterday. Is it yesterday's schema or today's schema, assuming the schema was changed?

meow18:02:44

Braid is an online group chat application with groups and tags, and no limits on either.

meow18:02:36

And the schema is evolving daily.

meow18:02:55

And we have a production instance running since day 1.

meow18:02:05

I use it every day.

Ben Kamphaus18:02:13

@meow answers to many of your questions are covered here: http://docs.datomic.com/schema.html#Schema-Alteration — however, an ident is not a schema element intrinsically (i.e. your own enums not in :db.part/db and an entity having an ident now or in the past doesn’t introduce the kind of complications you get from e.g. relaxing then trying to re-assert a unique constraint

meow18:02:47

I understand that aspect.

meow18:02:57

"Thus traveling back in time does not take the working schema back in time, as the infrastructure to support it may no longer exist. Many alterations are backwards compatible - any nuances are detailed separately below."

meow18:02:14

That was the answer I was looking for.

meow18:02:08

I wrote Schevo in Python. Schevo was for "schema evolution". It was similar to datomic but OO.

Ben Kamphaus18:02:33

I have to step away for a while, I’ll check in on the 20k limit re: idents Monday AM with the dev team. I’ll let you know how precise that limit is or if there are tradeoffs you can make (i.e. if you can keep running it up if it’s an important enough aspect of the architecture and you can accommodate via schema provisioning, cache settings, etc.).

meow18:02:50

Thank you for all your help.

Ben Kamphaus18:02:13

s/schema provisioning/machine provisioning

meow18:02:35

We could also take a federated approach to scaling.

meow18:02:47

@jamesnvc: @rafd @crocket See above for details on datomic limitations. ^

jamesnvc18:02:40

If I understand correctly, the ident limit is with regards to :db/ident things?

jamesnvc18:02:36

tags in braid are just strings that we do look-up on, so the schema shouldn’t actually be growing

jamesnvc18:02:21

(this would be relevant for another project @rafd and I have worked on though)

Ben Kamphaus18:02:08

@jamesnvc: yes this is only about the count of entities that have :db/ident and the impact on memory, I’m trying to source the practical limit that was quoted here as I’m not familiar with it, but the softer principle of limiting the total number of things with idents because you always pay their memory overhead should be a modeling consideration.

jamesnvc18:02:37

yeah, that makes sense

currentoor18:02:16

@robert-stuttaford: thanks!

Lambda/Sierra19:02:07

I would apply the same guideline for Datomic Idents that I use for Keywords in Clojure applications: do not use Keywords for anything user-generated.

tcrayford23:02:51

@bkamphaus: pretty sure I was wrong and the limit is just 2^20

2016-02-06

Channels