Fork me on GitHub
#datomic
<
2016-09-19
>
magnars07:09:13

I heard some rumors about people having issues with using datomic entity ids in JavaScript, since JS' highest safe integer is 2^52-1, while Datomic entity IDs are 64-bit longs. But I see that all my entity IDs are around 175e+11, meaning I would run out of my allotted 10B datoms way before I encounter that issue. What am I missing?

danielstockton07:09:17

I think you'll only have problems if you have a large number of partitions

magnars07:09:17

Thanks, that makes sense. 👍

yonatanel07:09:08

How does Datomic implement consistency on top of DynamoDB's eventual consistency?

danielstockton08:09:01

As far as I understand, it uses conditional put for the root refs (index-root-id etc..) and then subsequent segments are either there or not there yet

danielstockton08:09:30

Can't remember the exact behaviour if a query needs a segment that isn't there yet, I guess it throws an error and you just wait until it's available

danielstockton08:09:39

But might be completely wrong

yonatanel11:09:27

If I change the system clock will it mess up squuid order?

danielstockton11:09:48

potentially i think, it depends when the last one was created, how much you change it, and in what direction

danielstockton11:09:42

i think you're ok as long as you don't change it backwards past the time that the last one was created

danielstockton11:09:32

that might cause other problems with transaction instants anyway

stuartsierra13:09:21

@yonatanel If you change the system clock to a time earlier than the last recorded transaction, Datomic will reject transactions until the clock "catches up" again

marshall13:09:24

@magnars One note about EIDs in your external client - It’s generally not recommended to use the datomic-generated entity IDs as external identifiers. Whenever possible you should model identity yourself with a domain or generated identifier.

marshall13:09:22

@yonatanel @danielstockton is mostly correct about the way consistency is implemented. The conditional put of the root ONLY occurs once all the segments below that root are written. This means that you can never have an inconsistent database. If the root is present, all nodes below it are as well.

magnars13:09:14

@marshall: thanks for the heads up. I'm using them for only short-lived transient IDs in the client tho. They're never stored anywhere.

tengstrand13:09:02

We have a Datomic database where the "core entities” are categorised by countries and a lot of other entities around these (indirectly also categorised by country). We also have users corresponding to one of these roles: ADMIN, COUNTRY_ADMIN and REVIEWER. Unless you are an ADMIN, you can’t read information from other countries than the ones you belong to. ADMIN always has “write” rights (to add facts) to all entities. All other roles can only write to the entities belonging to countries you are a member of. A REVIEWER can read ADMIN related information, but is not allowed to write ADMIN information (the same for COUNTRY_ADMIN -> ADMIN). We keep track of the current logged in user, and we store the countries he belongs to and which role he has. How should we best implement this? 1. By adding extra parameters to every function that does a Datomic query + extra criterias in the query. Maybe by using Datomic rules. 2. Have a central function that returns a filtered database, based on the current users countries and user role level, that we use to query the database. 3. Any other ideas to solve these cross cutting concerns?

stuartsierra14:09:10

Without getting into your specific use case, a filtered database is a convenient general solution, but comes with a performance cost (examining every Datom your query touches). Adding extra selection criteria to every query might be more efficient, but comes at the cost of added complexity on every query. Another possible approach is to do all your normal queries without considering authorization, then trim the results based on what the current user is allowed to see.

robert-stuttaford14:09:17

root cause is transactor isn't receiving a public ip (doesn't need one), and so setting alt-host is failing. any recommendations for options, or should i just set an ip?

stuartsierra14:09:55

I've encountered that issue as well. Adding a public IP is the easiest thing to do. Some users have reported success removing or editing the alt-host line in the CloudFormation template; I don't know if that works.

robert-stuttaford14:09:59

@stuartsierra: a recent cognicast mentioned your prediliction for 'decanting Datomic databases'. is this something you've done a lot?

robert-stuttaford14:09:30

i'm busy preparing to do this for a pretty large database. any .. uh, tips? 🙂

robert-stuttaford14:09:12

by decant, i mean, rebuild in transaction order. and per tx, either discarding, or altering in flight

robert-stuttaford14:09:50

i have to use a streaming approach because it's tens of millions of transactions. i was wondering if there are any gotchas you may be able to warn me about

stuartsierra14:09:40

@robert-stuttaford: The main challenge is translating entity IDs from the "old" DB to the "new." If every entity in your database has a :db.unique/identity attribute, then just use those.

stuartsierra14:09:22

Without that, you have to maintain a mapping from old EIDs to new EIDs. I used a key/value store like LevelDB.

stuartsierra15:09:00

If you're relying on that EID mapping in an external store, then you cannot stream the transactions, because you have to get the resolved tempids from the previous transaction before you can translate the subsequent transaction.

stuartsierra15:09:16

Also make sure the process you're building is resumable: During a long import job, the Transactor will pause occasionally, causing transaction errors. Your Peer process has to be able to continue where it left off without skipping any transactions. Ideally, you want it to persist its state (i.e., last transaction copied) on disk.

robert-stuttaford15:09:32

thank you, @stuartsierra -- i'm definitely planning a pause capable approach

robert-stuttaford15:09:12

happily, i think i will be able to avoid the external ID mapping, because i can just add unique ids to everything in the source database first

robert-stuttaford15:09:44

and use the source database as the mapping, because i don't care about its cleanliness in the long run

robert-stuttaford15:09:11

i may have a question or two, but what you've shared so far is great. thank you

stuartsierra15:09:41

You're welcome.

robert-stuttaford15:09:42

what's the largest database you've decanted, @stuartsierra?

ckarlsen15:09:51

I've deleted all datomic db's, ran gc-deleted-dbs, ran full/freeze VACUUM in postgres, restarted all processes and somehow the "datomic_kvs" table use ~2.5GB of disk space?

robert-stuttaford15:09:02

wow. that's awesome!

stuartsierra15:09:28

That took days.

robert-stuttaford15:09:40

yeah i was just about to ask

robert-stuttaford15:09:55

i haven't counted datoms yet, but we're looking at 50mil+ txes

robert-stuttaford15:09:06

i'm going to be interleaving two databases into one

robert-stuttaford15:09:47

going to be quite a lot of fun, and it's going to feel really good to expunge all the newbie mistakes we made over the last 4 years

robert-stuttaford15:09:20

some real 🙈 moments in there

stuartsierra15:09:34

That's a common motivation for doing it.

robert-stuttaford15:09:12

any idea how big that database was in storage, @stuartsierra ?

stuartsierra15:09:54

Another trick: consider "decanting" into a dev database and then use backup/restore to move into distributed storage.

robert-stuttaford15:09:07

the longer term motivation for building this out now is that it becomes possible to rebuild far more quickly in future, e.g. to shard

robert-stuttaford15:09:30

oh, yes. definitely

ckarlsen15:09:59

is the diagnostic tool mentioned in the announcement of version 0.9.5302 easly available?

robert-stuttaford15:09:27

huh. think we'll go quite quickly; we're only at 84mil datoms

jaret15:09:01

@ckarlsen So if you deleted the DB as mentioned earlier I am not sure you can run the diagnostic. Is there a reason you cannot just delete the table and assume the 2.5gig is garbage that can no longer be collected?

ckarlsen16:09:49

@jaret no reason, just curious. I've been doing lots of retractions and additions lately on local dev db during testing, and often the transaction throughput is horribly slow.. from ~5ms to 2-3sec for no apparent reason. This is orginally a database that's been through a lot of software upgrades