Fork me on GitHub
#datomic
<
2019-10-17
>
timeyyy12:10:11

How do people handle schema migrations with datomic cloud? I was looking at conformity but it doesn't support the client api.

Nems12:10:24

Hi everybody, We are having an issue with the way datomic tx works. Our application allows user to register driver registrations with a start and end date for a certain license plate via xml files. If a new driver registration overlaps with the start and end date of the old driver registration, we cut off the old dates in our timelines. Our timelines are sorted on the highest tx. Just to be clear about the overlaps. Here's an example: driver registration A inserted at 2019-09-09 {:driver-registration/id "1XXX001-2018-01-01" :driver-registration/license-plate "1XXX001" :driver-registration/start #inst"2018-01-01" :driver-registration/end #inst"2019-01-01" :driver-registration/driver {:person/first-name "John" :person/last-name "Doe"}} driver registration B inserted at 2019-09-10 {:driver-registration/id "1XXX001-2018-06-01" :driver-registration/license-plate "1XXX001" :driver-registration/start #inst"2018-06-01" :driver-registration/end #inst"2020-01-01" :driver-registration/driver {:person/first-name "Mary" :person/last-name "Jane"}} timeline: 2018-01 2018-06-01 2020-01-01 |--driver John Doe---|-----driver Mary Jane------| As you can see the end date of driver registration A (John Doe) was cut off by the start date of driver registration b (Mary Jane). We retrieve this data by the following query: (d/q '[:find ?tx (pull ?dr [: {:driver-registration/person [:]}]) :in $ ?license-plate :where [?dr :driver-registration/license-plate ?license-plate] [?dr :driver-registration/id _ ?tx]] db "1XXX001") We sort the list by the value of ?tx and cut off the dates were necessary. This works fine for most cases but now image the user has made a mistake and wants the end date of driver registrations as the cut off date. Like this: 2018-01 2019-01-01 2020-01-01 |----driver John Doe-----|-----driver Mary Jane------| When the user uploads a new xml with the exact same data of driver registration A, we expect that driver registration A would now have the highest tx. But due to datomics redundancy elimination, datomic will filter out the data of the transaction and never update the tx of driver registration A. When the user asks for the driver registration timeline, he will still receive the old one. Is there a way to solve this issue on the query side? One of the solutions would be to add a field :driver-registration/last-upload with a date value to driver registration but that feels as if I'm rebuilding the db/txInstant system.

favila12:10:22

Why are you not sorting by the registration start and end dates? Tx Instant is a time of record and has no connection to your domain’s “business” times. Suppose you for eg uploaded an old registration?

Nems12:10:10

Well the start and end dates do not show at which time the driver registrations was entered in the database. In my example they are nice in order but it's possible that a user would insert a driver registration with an overlap at the start of an old driver registration. If we would just sort on start and end date, the newer one will always be overwritten by the old one. Which shouldn't happen. Great article. So going by that article we should add a field stating when a driver registration was last-uploaded?

favila12:10:26

I guess so? I thought by “newer one” you just mean a start time > next range’s end time. I guess I don’t understand precisely your timeline overlap algorithm

favila12:10:45

If time of record is really vital to you here you can retrieve the tx of start and end specifically

favila12:10:27

You are retrieving the tx of the id which of course is not going to change much

favila12:10:11

Maybe the tx you really mean is (max tx-start tx-end)

favila12:10:37

Sorry that was vague. I’ll be more precise in a second

Nems13:10:08

Sorry about not being clear. Hopefully this flow helps to understand how our application works, With newer I mean the latest transaction date: Insert #1: driver registration A inserted at 2019-09-09 {:driver-registration/id "1XXX001-2018-01-01" :driver-registration/license-plate "1XXX001" :driver-registration/start #inst"2018-01-01" :driver-registration/end #inst"2019-01-01" :driver-registration/driver {:person/first-name "John" :person/last-name "Doe"}} timeline: 2018-01 2019-01-01 |----driver John Doe-----| Insert #2: driver registration B inserted at 2019-09-10 {:driver-registration/id "1XXX001-2018-06-01" :driver-registration/license-plate "1XXX001" :driver-registration/start #inst"2018-06-01" :driver-registration/end #inst"2020-01-01" :driver-registration/driver {:person/first-name "Mary" :person/last-name "Jane"}} timeline: 2018-01 2018-06-01 2020-01-01 |--driver John Doe---|-----driver Mary Jane------| Insert #3: driver registration A inserted at 2019-09-11 {:driver-registration/id "1XXX001-2018-01-01" :driver-registration/license-plate "1XXX001" :driver-registration/start #inst"2018-01-01" :driver-registration/end #inst"2019-01-01" :driver-registration/driver {:person/first-name "John" :person/last-name "Doe"}} Expected: 2018-01 2019-01-01 2020-01-01 |----driver John Doe-----|-----driver Mary Jane------| Reality: 2018-01 2018-06-01 2020-01-01 |--driver John Doe---|-----driver Mary Jane------|

Nems13:10:17

even with taking the tx of start and end date, the last insert will always be ignored as it is completely the same as the first insert and datomic will use redundancy elimination

Nems13:10:37

now that I'm typing this it's starting to make sense to just add an extra field stating upload-date

favila13:10:02

what I meant was something like this:

favila13:10:05

(d/q '[:find ?tx (pull ?dr [* {:driver-registration/person [*]}])
       :in $ ?license-plate
       :where
       [?dr :driver-registration/license-plate ?license-plate]
       [?dr :driver-registration/start _ ?tx-s]
       [?dr :driver-registration/end _ ?tx-e]
       [(max ?tx-s ?tx-e) ?tx]
       ] db "1XXX001")

favila13:10:25

again, this model assumes that your notion of driver registration “age” exactly corresponds to the sum of its tx times, which might be possible but more likely you actually have a separate explicit domain-specific notion of “record effective date” which you just haven’t noticed yet

favila13:10:28

also note the mismatch in granularity: tx time is about individual facts not “records”. Datomic doesn’t know anything about records

favila13:10:42

an entity is not necessarily a record

favila13:10:53

e.g. it may be the union of attributes from multiple records, or it may be a value or “sub-record” (e.g in isComponent case) or it may just be a convenient thing to join on

Nems13:10:02

"but more likely you actually have a separate explicit domain-specific notion of “record effective date” which you just haven’t noticed yet" I think that is the case here. Also, I haven't noticed this statement "also note the mismatch in granularity: tx time is about individual facts not “records”" so your example of the query makes more sense to me now. I think I can figure it out from here. Thanks for taking the time to help me favila!

Shaitan12:10:30

is bigdec safe from floating point errors?

rapskalian13:10:30

Datomic cloud documentation mentions: > [:db/unique] attribute must have a :db/cardinality of :db.cardinality/one. https://docs.datomic.com/cloud/schema/schema-reference.html#db-unique However, in my project, I’ve been using the following schema definition just fine:

{:db/ident       :user/email
    :db/unique      :db.unique/identity
    :db/valueType   :db.type/string
    :db/cardinality :db.cardinality/many}
It appears to work as expected:
(d/transact conn {:tx-data [{:user/email "calvin"}]})
=> {:tx-data [#datom[13194139533323 50 #inst "2019-10-17T13:44:42.951-00:00" 13194139533323 true] #datom[10740029580116059 78 "calvin" 13194139533323 true]]
(d/transact conn {:tx-data [{:user/email ["jenny"]}]})
=> [#datom[13194139533325 50 #inst "2019-10-17T13:47:13.676-00:00" 13194139533325 true] #datom[13453624277467228 78 "jenny" 13194139533325 true]]
And I can even pull:
(d/pull db [:user/email] [:user/email "calvin"])
#:user{:email ["calvin"]}

marshall13:10:15

@cjsauer can you do the following:

(d/pull (d/db conn) '[*] :user/email)

rapskalian13:10:48

#:db{:id 78,
     :ident :user/email,
     :valueType #:db{:id 23, :ident :db.type/string},
     :cardinality #:db{:id 36, :ident :db.cardinality/many},
     :unique #:db{:id 38, :ident :db.unique/identity}}

marshall13:10:06

fascinating

marshall13:10:54

what version of datomic? (cloud or onprem)?

rapskalian13:10:16

This is cloud. Checking on version (what would be the fastest way to see that? CF?)

rapskalian13:10:37

This is in the output section of my compute stack

DatomicCFTVersion	512
DatomicCloudVersion	8806

marshall13:10:51

i’m looking into it

rapskalian13:10:55

I also did a split stack deploy, if that matters

marshall14:10:47

@cjsauer you’re correct that it can be done. Dont. 🙂 The semantics of unique identity are such that having a card-many attr there is pretty dicey I’ll look into filing this as somethign that should potentially throw or warn

rapskalian14:10:34

@marshall ha okay, thanks for checking. I’m a bit puzzled tho. Email addresses seem to challenge those semantics. Is there a technical reason that unique-many attributes can’t exist? Regardless, throwing an exception there would be great.

marshall14:10:33

i suppose not a technical reason so much as a semantic one unique identity says “this specific attr/value pair is unique in the database”

marshall14:10:45

being able to assert multiple values for that is…. complicated?

rapskalian14:10:05

>“this specific attr/value pair is unique in the database” This could still hold for a card-many attribute theoretically. I opened an issue/PR for datascript before posting here on this subject, and @tonsky mentioned: > Upsert would not work because it’s not clear which value to look at. Or we must look at all provided values and make sure they all resolve to the same entity. This might be the complication. When upserting multiple values one must ensure that they do indeed resolve to the same entity. https://github.com/tonsky/datascript/issues/320

marshall14:10:44

well, you definitely open up to conflicts

marshall14:10:24

i.e. what if you have two separate “entites” in your transaction, each using one of your unique emails, but they both contain a separately conflicting datom for some other attr

marshall14:10:39

since they both resolve to the same entity based on that unique id, you then have a conflict of the other datom

rapskalian14:10:03

It would seem appropriate for that transaction to fail in that example. The two entities are resolved to one, and then can be constrained from there.

rapskalian14:10:49

Maybe the detection of that conflict is the hard part tho. I’m ignorant of the details.

rapskalian14:10:52

Given this limitation, is it possible to model users being uniquely identified by multiple email addresses? This felt like a very natural way to model my domain, where a user can be part of multiple teams/orgs, each with their own email domains. It would be great if those emails did indeed resolve to the same entity.

timcreasy15:10:31

You could model this with the concept of an “organization user” which can be mapped to a “user”. Each “organization user” can have their unique identifier (email), a reference to user and the organization they belong to.

rapskalian16:10:06

Hm yeah that could work. It would function similar to a traditional join table.

rapskalian16:10:21

Can’t shake the feeling that it falls into the category of accidental complexity…

benoit14:10:55

You have the same conflict if you use different identity attributes on the same entity.

favila14:10:07

Correct, and I’ve been burned by this before. datomic will pick one for entity id purposes in some undefined way

favila14:10:18

I actually don’t like identity attributes at all anymore 🙂

favila14:10:34

I’d prefer entity resolution only happened via :db/id, and if you use a db/id value with a lookup ref whose attribute is identity, only then would it upsert for you

favila14:10:01

however upsert predates lookup refs, so 20/20 hindsight and all that…

rapskalian14:10:58

What is the conflict? I’m not disagreeing, just failing to conceptualize. Is there a minimal example of a transaction that shows this?

benoit14:10:36

Assuming :product/id and :product/asin are two identity attributes, the following tx will throw with :db.error/datoms-conflict:

[{:product/id #uuid "59d1da4a-7de0-4625-ad83-b63ac8346368"
  :product/name "A"}
 {:product/asin "B00VEVDPXS"
   :product/name "B"}]

rapskalian14:10:17

That feels broken 🤔 The ambiguity must lie in the expanded form of this transaction perhaps?

benoit14:10:08

Not sure why it seems broken. It makes sense. I was just pointing out the fact that Datomic detect those conflicts already so I'm not sure why it could not do it for card many identity attributes.

rapskalian14:10:54

My mental model for this must be off. That transaction looks like two products, each with different forms of identity.

benoit14:10:07

Sorry, yes. I should have said that these 2 values refer to the same entity in the database.

rapskalian14:10:54

Ahh okay, and so the :product/name datom is the conflict.

rapskalian15:10:22

>I was just pointing out the fact that Datomic detect those conflicts already so I’m not sure why it could not do it for card many identity attributes. This was my thought as well. The conflict check is similar, just over a collection of identity values instead of one.

rapskalian15:10:28

Mostly tho, semantically, card-many-unique attributes seem very natural. Using Game of Thrones as an example, royal figures can have many identifying titles: Robert Baratheon, first son of XYZ, slayer of ABC, builder of QRS, etc etc etc.

hadils16:10:11

Can you build 2 datomic cloud instances in the same region with different names? can you share the key?

hadils16:10:18

Is this wise?

ghadi16:10:01

which key @hadilsabbagh18?

hadils16:10:35

The key named datomic.

ghadi16:10:20

can you be more specific? some cloudformation parameter?

hadils16:10:41

I am going to create a separate ssh keypair. I think this is wise...

ghadi16:10:05

oh that's the AWS EC2 Key Pair -- yes you can share those

ghadi16:10:46

we put one Datomic system in one region, we attach a couple query groups to it, and it hosts hundreds of databases

ghadi16:10:58

but you can put two isolated systems in the same region, too

ghadi16:10:18

the AWS EC2 Key Pair is only used if you need to ssh to the actual datomic box

ghadi16:10:39

the Bastion/Analytics Gateway uses a different key pair

hadils17:10:24

I started deploying to us-west-2 according to instructions and I repeatedly get this error:

Embedded stack arn:aws:cloudformation:us-west-2:962825722207:stack/stackz-StorageF7F305E7-13QLTET3W9OAQ/ad7f7950-f0ff-11e9-b33c-02a77ed54d64 was not successfully created: The following resource(s) failed to create: [DatomicCmk, CatalogTable, FileSystem, LogGroup, LogTable].
Does anyone know why it fails?

ghadi17:10:38

you'll need to look at your Event Log in the CF Stack

hadils17:10:10

Thanks @ghadi!