xtdb 2020-07-22 | Slack Archive

dominicm10:07:20

How do transaction functions play with bitemporality? If I change the transaction function at a point in the past, are documents beyond that point re-calculated?

jarohen10:07:40

Hey @U09LZR36F 🙂 We run the transaction function once, when the transaction's indexed, and it uses the current version of the transaction function (in both valid- and transaction-time)

refset10:07:17

^ yep, the short answer is they don't. It's probably a good idea to never change an existing transaction function, just use a new name. Functions are looked-up at the valid-time set to the current tx-time: https://github.com/juxt/crux/blob/master/crux-core/src/crux/tx.clj#L184-L185

dominicm10:07:38

Also, if I have a merge tx fn, and I go back and insert a merge in the past, will entities have a cascading update?

jarohen10:07:03

Not with a 'simple merge' I'm afraid, but your transaction function does have the ability to both query the database at any point in time and insert put operations at any point in the timeline - you could apply the merge forward in time from that point, if that makes sense?

jarohen10:07:54

The history API is probably your friend there 🙂

dominicm11:07:32

It does make sense, yes.

dominicm10:07:19

I'm trying to achieve the ability to assert a subset of an entity, in the past, where new keys I assert in the past are preserved into the current version of the entity, assuming they've not been overwritten since.

refset10:07:12

this is an interesting problem. I call it a "retroactive merge", and the only solution I have envisaged up until now requires fairly extensive book-keeping. I never wrote up the full solution

stopa20:07:32

Hey team, noob question, is there an essay that talks about the current differences between Crux and Datomic? Researching into both atm. Wonder if crux has the same index structures (4 indexes), and does caching etc. would also love to learn more about how it goes about performing complex queries

refset20:07:21

Hey! There's a comparison write-up in the docs' FAQ section, but also I recommend checking out this talk on the internals: https://www.youtube.com/watch?v=YjAVsvYGbuU

👍 3

refset20:07:07

The index structure is pretty different, optimised for temporal queries (where Datomic relies on seeking), and the query engine relies on fast access to local index storage. Complex queries are pretty fast all things considered, certainly in the same ballpark. We benchmark nightly against Neo4j too 🙂

👍 3

stopa20:07:15

Awesome, thanks for the context @U899JBRPF ! Excited to dive deeper :]

🙏 3

jonpither20:07:08

Queries can be lazily returned too, which means we don't need to have all the results in memory at any stage, applies also for complex queries.

❤️ 3

stopa20:07:40

One more noob question, as I watch the talk: does Crux support scrubbing Data for security / gdpr reasons? (I.e say I really actually need to remove the fact that this entity every existed) — would this be possible?

jonpither21:07:27

Yes @stopachka Crux was architected with this in mind. The evict operation is what you want https://nextjournal.com/crux-tutorial/evict. We keep the tx log separate from the doc store for this reason, so that sensative content in the documents can be readily evicted.

❤️ 3

stopa21:07:42

Butiful!

stopa21:07:05

Okay final noob q for the day: can Crux handle many to many relationships efficiently? I.e find all users who belong to a group, find all groups who belong to a user. Does that data have to be de-normalized?. (I.e keep a copy of group ids in ‘user’ and copy of user-ids in ‘group’)

jonpither21:07:08

Should do, yep!

❤️ 3

stopa21:07:50

Amaazing!

stopa21:07:34

How would that work? Am guessing, I would have a document for each group, with a users key in it, that looks like a set:

(crux/submit-tx
 node
 [[:crux.tx/put
   {:crux.db/id (str "group-" (uuid))
    :name "our awesome group"
    :users #{"user-uuid-a" "user-uuid-b}}]])

Now, say I wanted to write a query: Give me all the groups that belong to "user-uuid-a"

'{:find [e]
   :where [[e :users "user-uuid-a"]]}

(Apologies pre-emptively, still need to learn datalog, not sure if above is the correct way to write it) How would crux be able to perform this query, without fetching all groups? (what would the index look like to achieve this)

ordnungswidrig22:07:50

@stopachka This might help https://opencrux.com/docs#_query_join_with_two_attributes_including_a_multi_valued_attribute

stopa22:07:28

Thanks @ordnungswidrig -- am reading through it but am not quite sure I understand it. I see:

'{:find [e2]
   :where [[e :last-name l]
           [e2 :follows l]
           [e :name "Ivan"]]}

Which seems similar to what I want. Here it says, if I understand correctly: • find all entities where the last name in the follows set includes the last name of the entity with the name "Ivan" But, I am not sure how crux could actually run that query efficiently. Would it have to run through every entity in the database that has a follows set, and do a check, or does some index help it find the relevant entities?

refset22:07:37

Crux tracks attribute cardinalities and uses that to inform the join order. You can see the final calculated join order if you turn on debugging, but I expect the engine will: 1. detect that [e :name "Ivan"] only has a limited number of possibilities, so bind e first 2. lookup all possible associated l values 3. from there, return all possible e2 values Crucially, as mentioned before, these chains of lookups happen lazily, tuple by tuple

stopa22:07:45

Thanks for the help @U899JBRPF! Am a bit confused with 2. Am trying to understand how lookup all possible l values works. What does the index "table" look like? From the talk I would guess a.

Attribute  | Value        | Entity
:follows   | #{"Ivanonv"} | :ivan

^ is something like that right? Or does crux detect when an attribute is a collection, and create an index of it's elements? b.

Attribute  | Value        | Entity
:follows   | "Ivanonv".   | :ivan
;; the `follows` set is represented in the index by it's values

-- If it's stored like a. -- I don't see how it could perform the query efficiently, it would need to do a linear scan on all entities with the key :follows -- Thoughts much appreciated : } -- apologies if thinking above is way off the mark

stopa22:07:20

(Or maybe another option is that we would model this differently, maybe a following object that keeps a :follower and a :followee)

refset23:07:57

Ah right, yeah so vectors and sets are detected and expanded into the index, so b) 🙂

❤️ 3

stopa23:07:59

ooof amazing!

2020-07-22

Channels