xtdb 2020-05-26 | Slack Archive

Eric Ihli00:05:53

Is it possible to do transactional "unique on" inserts in Crux, or is this trying to fit a square peg into a round hole? I basically want to have a unique constraint on a certain field(s). The "match" transaction only works with an entity id, right? So if I want a doc to be unique on both :email/ and :company/, I have a problem. I could use the field(s) in the entity id and then use a match/nil?->put transaction, but then I'm duplicating data, the data is both in the eid and the doc and I'm opening up to bugs where the entity attribute values that I'm using for uniqueness are mismatched with the crux.db/id.

jarohen11:05:05

do you mean unique on the pair of email and company (as in, people from different companies can have the same email), or that no two entities can have the same email, nor the same company?

Eric Ihli12:05:29

No two entities can have the same (email & company). Email was a bad example attribute in this case. Makes more sense with "username".

Eric Ihli12:05:39

Looks like I'm thinking of "transaction functions"? First time encountering that term. > Datomic’s transactions and transaction functions are processed via a centralised transactor which can be configured for High-Availability using standby transactors. Centralised execution of transaction functions is effectively an optimisation that is useful for managing contention whilst minimising external complexity, and the trade-off is that the use of transaction functions will ultimately impact the serialised transaction throughput of the entire system. Crux does not currently provide a standard means of creating transaction functions but it is an area we are keen to see explored. If transaction functions and other kinds of validations of constraints are needed then it is recommended to use a gatekeeper pattern which involves electing a primary Crux node (e.g. using ZooKeeper) to execute transactions against, thereby creating a similar effect to Datomic’s transactor component

jarohen15:05:50

The way we currently model that in Crux is to use the fact that entity IDs are unique - if you're able to split your document you can have a document with an entity ID of the unique part, and then link it to the user document. Let's say you had a new user {:crux.db/id #uuid "c52af275-a26b-4337-943a-f3795cc7b696", :name "foo", :email "", :company "bar"} - you could index this as follows:

[[:crux.tx/match {:email "", :company "bar"} nil]
 [:crux.tx/put {:crux.db/id {:email "", :company "bar"}, :email "", :company "bar", :user-id #uuid "c52af275-a26b-4337-943a-f3795cc7b696"}]
 [:crux.tx/put {:crux.db/id #uuid "c52af275-a26b-4337-943a-f3795cc7b696", :name "foo"}]]

The match operation ensures that no other user has this email/company combination, and you can then join these entities to reconstruct the original document. Separately, we are currently working on making the transaction function API stable and hope to land them in the near future

mmer11:05:17

@rschmukler isn't there a fundamental design difference between a reddit feed and Crux. Crux allows for changes to data while retaining the old version. The idea of order in this case not longer works as the latest instance would have an old ID but would be the latest and therefore notionally the first you would want in the list. The id is just that an Identity, if you try to overload some other meaning to it I feel that you are going to find issues down the line. Is it not better to maintain that 'order' using a data value on the element rather than overloading the semantics of the ID.

jarohen11:05:53

@mmer @rschmukler the entities are all stored sorted within Crux, by the hash of the ID - if you just want to paginate without caring about the what it's sorted by then this is a consistent order suitable for pagination (including an :after parameter, should you need it). That said, most of the time you probably do want to specify a sort-order to paginate by - in this case, would recommend a key/value on the doc to use as the sort key, as @mmer suggests. In the past I've used a tuple of the entity's 'created-at' and the entity-id as the :after parameter (to tie-break if two entities happen to be created at the same time), can recommend.

rschmukler14:05:45

Even in the case of sorting by the ID, I think explicit sorts are better (ie. a programmer should not rely on the fact that Crux sorts the ID, as you may change that order some day (probably not, but could)), even if it is by the :crux.db/id field anyway. @mmer I'm not sure I agree with the resistence on overloading IDs. I think it depends on the domain. For example, timestamps (or any monotonic unit really) as primary IDs confer identity ensures uniqueness for a discrete moment in time. These IDs are inherently ordinal, and forcing a document to add a duplicate field for the sake of order seems redundant. I do understand that Crux has better facilities for traversing through time and handling updates to said document, but I do think it's worth considering whether you want to intentionally restrict other properties that may exist in a given domain's IDs

rschmukler14:05:09

@jarohen to clarify, I think the default deterministic sort makes sense, I'm just saying if code is explicitly reyling on Crux's default sorting, it'd be preferable that it explicitly states it when it declares the query

jarohen14:05:45

I agree that explicit sorts are better, yes 🙂 > These IDs are inherently ordinal This isn't something we can rely on for all of our ID types, unfortunately

jarohen14:05:28

(thinking out loud here, no promises!) we could consider further indexing an entity's key if it were keyed by a map - e.g. {:crux.db/id {:twitter/id 12345}, :content "hello world"} could also be considered to have a :twitter/id attribute

👍 4

adamfeldman16:05:33

This made me think of FoundationDB’s keys: https://apple.github.io/foundationdb/data-modeling.html#composite-types

jarohen17:05:35

ah 🙂 cheers for the pointer!

👍 4

Jacob O'Bryant03:05:27

I would love this feature. Currently in Biff I automatically take keys from map IDs and duplicate them in the doc.

jarohen14:05:25

but that no doubt has impacts that I haven't yet thought about

rschmukler15:05:35

That's an interesting idea, indeed!

2020-05-26

Channels