Fork me on GitHub
#xtdb
<
2022-05-20
>
Anthony Bui15:05:29

Hi, this is probably more of a Clojure question but worth a shot. I have a transaction function (1) for adding new key-value pairs, and one for appending a userID to a set in a document (2): (1) [[::xt/put (assoc entity :buildings/name buildingName :buildings/area buildingArea) validStartTime]] (2) [[::xt/put (assoc entity :buildings/persons (conj (:buildings/persons entity) personId)) entryTime]] Because my data arrives unordered, I may have to create a barebones building doc that only contains the ID as well as an empty set that tx-fn (2) later can append to. The tx-fn (1) is there to add the rest of the building information whenever it arrives to the system. The problem is, experimenting with this on a new node strangely enough shows that the order in which the tx-fns are run matters? If I create a doc with an ID and an empty set #{} and run (2), the userID is added succesfully, however running (1) next - nothing happens. If I run (1) after creating the empty doc and then (2), it works as expected. Am I missing something? The validStartTime sent to (1) is always earlier than entryTime in (2)

refset15:05:46

Hey again πŸ™‚ it looks like you have a missing buildingPersons value in (1) - could that be related? Or is that just a typo here?

Anthony Bui16:05:04

Hi again, yeah sorry it's a typo πŸ˜› (edited message to fix)

πŸ˜… 1
refset17:05:32

What does the entity-history show?

Anthony Bui18:05:17

Here's the history where tx 2 is creation of empty doc, tx 3 is running tx-fn (2) adding personId to set, tx 4 is running tx-fn (1) attempting to add fields. From what I can see, even when running in this order, the history (correctly) shows the valid-time timeline {:tx-time #inst "2022-05-20T18:39:43.462-00:00", :tx-id 3, :valid-time #inst "2016-01-01T00:00:00.000-00:00", :content-hash #xtdb/id "c9354e0f69a5c43fd1cfd91c78773a88c3f83165"} {:tx-time #inst "2022-05-20T18:39:46.239-00:00", :tx-id 4, :valid-time #inst "2013-01-01T00:00:00.000-00:00", :content-hash #xtdb/id "ad832526ad525ea08ba11c89133f9c6103419935"} {:tx-time #inst "2022-05-20T18:39:37.503-00:00", :tx-id 2, :valid-time #inst "2000-01-01T00:00:00.000-00:00", :content-hash #xtdb/id "af1f8340456f01cbec7cdf9aad3a7f7f910b7a79"}

Anthony Bui18:05:01

If I flip the order and run tx-fn (2) before (1), the valid-time timeline is the same yet when fetching my entity, it has been updated as expected (as opposed to the ordering above) {:tx-time #inst "2022-05-20T18:46:55.516-00:00", :tx-id 4, :valid-time #inst "2016-01-01T00:00:00.000-00:00", :content-hash #xtdb/id "ad832526ad525ea08ba11c89133f9c6103419935"} {:tx-time #inst "2022-05-20T18:46:53.900-00:00", :tx-id 3, :valid-time #inst "2013-01-01T00:00:00.000-00:00", :content-hash #xtdb/id "c047d161bca3b5acce49557a9510d664612526b7"} {:tx-time #inst "2022-05-20T18:46:53.843-00:00", :tx-id 2, :valid-time #inst "2000-01-01T00:00:00.000-00:00", :content-hash #xtdb/id "af1f8340456f01cbec7cdf9aad3a7f7f910b7a79"} edit: might be worth mentioning this is testing in repl

refset19:05:20

ah, sorry to be a pain but please can you add :with-docs? true to the opts for that

Anthony Bui20:05:18

Nono, I'm the pain here πŸ˜…, here it is, running the transactions in the order that gives unexpected behavior: {:tx-time #inst "2022-05-20T19:57:11.143-00:00", :tx-id 3, :valid-time #inst "2016-01-01T00:00:00.000-00:00", :content-hash #xtdb/id "c9354e0f69a5c43fd1cfd91c78773a88c3f83165", :doc {:buildings/persons #{"abc123id"}, :xt/id 1234}} {:tx-time #inst "2022-05-20T19:57:11.260-00:00", :tx-id 4, :valid-time #inst "2013-01-01T00:00:00.000-00:00", :content-hash #xtdb/id "ad832526ad525ea08ba11c89133f9c6103419935", :doc {:buildings/persons #{"abc123id"}, :buildings/name "Victoria Stadion", :buildings/ownerId "51", :xt/id 1234}} {:tx-time #inst "2022-05-20T19:57:11.054-00:00", :tx-id 2, :valid-time #inst "2000-01-01T00:00:00.000-00:00", :content-hash #xtdb/id "af1f8340456f01cbec7cdf9aad3a7f7f910b7a79", :doc {:buildings/persons #{}, :xt/id 1234}} And in reverse order: {:tx-time #inst "2022-05-20T20:09:23.583-00:00", :tx-id 4, :valid-time #inst "2016-01-01T00:00:00.000-00:00", :content-hash #xtdb/id "ad832526ad525ea08ba11c89133f9c6103419935", :doc {:buildings/persons #{"abc123id"}, :buildings/name "Victoria Stadion", :buildings/ownerId "51", :xt/id 1234}} {:tx-time #inst "2022-05-20T20:09:22.567-00:00", :tx-id 3, :valid-time #inst "2013-01-01T00:00:00.000-00:00", :content-hash #xtdb/id "c047d161bca3b5acce49557a9510d664612526b7", :doc {:buildings/persons #{}, :buildings/name "Victoria Stadion", :buildings/ownerId "51", :xt/id 1234}} {:tx-time #inst "2022-05-20T20:09:22.466-00:00", :tx-id 2, :valid-time #inst "2000-01-01T00:00:00.000-00:00", :content-hash #xtdb/id "af1f8340456f01cbec7cdf9aad3a7f7f910b7a79", :doc {:buildings/persons #{}, :xt/id 1234}}] (data slightly masked)

πŸ™ 1
refset20:05:26

thanks for that, I think I understand now - I reckon you want to be specifying an explicit valid-time-end on the put that is greater than the valid-time-start of :tx-id 3 ...which will essentially replace (/'correct') the :tx-id 3 version

Anthony Bui20:05:03

Thanks, that did the trick! I left the valid time-end out since we don't know how long the building will be in use, but I guess I could set a date like year 2090 and await the delete-building event?

refset21:05:15

πŸ˜„ awesome!

1
Anthony Bui13:05:11

Hey again, kinda the same/similar problem, I'm testing out the order of sending in tx-fn [remove person from set at 2016-01-04j], [add person to set at 2016-01-01], [add building info at 2013-01-01]. If I put a valid end-time far in the future on the fx-fn that appends fields to a building, as we discussed yesterday, the changes of adding/removing a person (through conj/disj tx-fn) aren't saved. If I were to remove the explicit valid end-time on the building init, then it works as expected in this specific scenario (but not in another scenario and so on...) A bit confusing, but I'm at fault for seeing this as 'deletions' when in fact it's all about puts and valid-times on the same doc

Anthony Bui19:05:11

I'm very sorry for the bother, but it seems I was too hasty with the solution proposed yesterday (when adding building information, set explicit end time that is greater than addingPersons valid time-start). Here is the doc-history for whenever you have time to check it out! (ascending order) 1. Init empty building doc in year 2000, OK {:tx-time #inst "2022-05-21T19:06:48.841-00:00", :tx-id 3, :valid-time #inst "2000-01-01T00:00:00.000-00:00", :doc {:building/persons #{}, :xt/id 1234}}] 2. Add building information in year 2013, OK but it contains a person that doesn't arrive until 2016 {:tx-time #inst "2022-05-21T19:06:49.861-00:00", :tx-id 5, :valid-time #inst "2013-01-01T00:00:00.000-00:00", :doc {:building/persons #{"42314-XXXXXX-45346"}, :building/name "Victoria Stadion", :building/ownerId "51", :xt/id 1234}} 3. Actual addition of person to building set in 2016 (same content as above) {:tx-time #inst "2022-05-21T19:06:49.861-00:00", :tx-id 5, :valid-time #inst "2016-01-01T00:00:00.000-00:00", :doc {:building/persons #{"42314-XXXXXX-45346"}, :building/name "Victoria Stadion", :building/ownerId "51", :xt/id 1234}}

refset21:05:44

Hey again, it's no problem at all - it seems like a tricky problem πŸ™‚ although I think an executable example would really help me grok things. Would you mind elaborating with something like this:

(with-open [n (xt/start-node {})]
    (xt/submit-tx n [[::xt/put {:xt/id :put-building-fn
                                :xt/fn '(fn [ctx i]
                                          [[:xtdb.api/put {:xt/id (str "building" i)}]])}]
                     [::xt/put {:xt/id :add-kvs-fn

                                :xt/fn '(fn [ctx eid & kvs]
                                          [[:xtdb.api/put (apply assoc (xtdb.api/entity (xtdb.api/db ctx) eid) kvs)]])}]
                     [::xt/put {:xt/id :conj-persons-fn
                                :xt/fn '(fn [ctx eid & persons]
                                          [[:xtdb.api/put (update (xtdb.api/entity (xtdb.api/db ctx) eid) :persons #(conj % persons))]])}]
                     [::xt/fn :put-building-fn 1]])

    (xt/sync n)
    (xt/submit-tx n [[::xt/fn :add-kvs-fn "building1" :a 1 :persons #{}]])
    (xt/sync n)
    (xt/submit-tx n [[::xt/fn :conj-persons-fn "building1" "alice" "bob"]])
    (xt/sync n)
    (clojure.pprint/pprint (map #(select-keys % [::xt/tx-id ::xt/valid-time ::xt/doc])
                                (xt/entity-history (xt/db n) "building1" :asc {:with-docs? true}))))
;=>
(#:xtdb.api{:tx-id 0,
            :valid-time #inst "2022-05-21T21:20:30.566-00:00",
            :doc #:xt{:id "building1"}}
 #:xtdb.api{:tx-id 1,
            :valid-time #inst "2022-05-21T21:20:30.571-00:00",
            :doc {:a 1, :persons #{}, :xt/id "building1"}}
 #:xtdb.api{:tx-id 2,
            :valid-time #inst "2022-05-21T21:20:30.574-00:00",
            :doc
            {:a 1, :persons #{("alice" "bob")}, :xt/id "building1"}})

refset21:05:30

And then be very explicit about what you want the output to be instead with a handcrafted ideal output

Anthony Bui22:05:20

Thank you once again for your help! I took your code and added a valid time-start parameter for all three functions:

'(fn [ctx i vts]
     [[:xtdb.api/put {:xt/id (str "building" i)} vts]])

'(fn [ctx eid vts & kvs]
     [[:xtdb.api/put (apply assoc (xtdb.api/entity (xtdb.api/db ctx) eid) kvs) vts]])

'(fn [ctx eid vts & persons]
     [[:xtdb.api/put (update (xtdb.api/entity (xtdb.api/db ctx) eid) :persons #(conj % persons)) vts]])
I also rearranged the order of the tx-fns executions to match my common scenario of having data arrive to the system in the wrong order:
[::xt/fn :put-building-fn 1 #inst "2000-01-01"]])
[::xt/fn :conj-persons-fn "building1" #inst "2016-01-01" "alice" "bob"]
[::xt/fn :add-kvs-fn "building1" #inst "2013-01-01" :a 1 :persons #{}]
This gives:
(#:xtdb.api{:tx-id 0,
            :valid-time #inst "2000-01-01T00:00:00.000-00:00",
            :doc #:xt{:id "building1"}}
 #:xtdb.api{:tx-id 2,
            :valid-time #inst "2013-01-01T00:00:00.000-00:00",
            :doc {:persons #{}, :a 1, :xt/id "building1"}}
 #:xtdb.api{:tx-id 1,
            :valid-time #inst "2016-01-01T00:00:00.000-00:00",
            :doc {:persons (("alice" "bob")), :xt/id "building1"}})
The ideal output is exactly the one that is produced when your code is left unchanged (besides me wanting the set to contain persons in singles like #{"alice", "bob"}, but that is unrelated). In this case, the building again loses it's previously attained key :a 😞. When loading my data, it is common for a "personEnteredBuilding" event to have arrived before "buildingCreated". When "personEnteredBuilding" is received, and there is no building of that id in the db, the plan would be to put a building with an empty set and then add the person to this set. This set will rapidly change as people keep entering and exiting the building, whose events may unfortunately come in the wrong order, something we try to fix with sending in valid-times as per above. Somewhere along the line, the "buildingCreated" event finally arrives and should add kvs without modifying the set or its history. This event has an earlier valid time than any of the "personEntered" or "personExited"

Hukka06:05:41

Kinda sounds like you cannot really do just temporal inserts of the data once. Either at every time you get new information, you need to change all of the valid time ranges, not just insert one new document. Or really do fault tolerant (or rather missing information tolerant) event sourcing, instead of trying to normalize the state at every point of time from incomplete data

Hukka06:05:12

If even the events like "person entered" and "person exited" can come in any order, event sourcing seems like the better choice. Then you can just ignore exit events that didn't have a corresponding enter event, or even show extra information that some people have been present even though you don't know when exactly

Anthony Bui11:05:42

Sounds reasonable, we'll look into it more - thanks!!

genekim22:05:04

Purely FYI: Xodus vs. RocksDB performance: https://blog.aawadia.dev/2021/04/03/xodus-vs-rocks/ TL;DR: RocksDB: 10x faster writes, 2x faster reads, 8x less storage (I’m still using Xodus for xtdb; will eventually switch to rocksdb…)

❀️ 2
refset23:05:19

Interesting write up - thanks for sharing! As it happens we've been working on adding column family support to the RocksDB module this week so we can pass on some more of those performance benefits πŸ™‚