Fork me on GitHub
#xtdb
<
2021-08-04
>
Tuomas06:08:26

I'm trying to persist information about the transactions that are making new versions of documents or deleting documents. My goals are 1. to be able to display entity-history but to also display information about the transactions where the operations have came in (which user has sent the transaction and why) and 2. show all changes a user has made. My current idea is to submit an additional document for each transaction and either give it a db/id or an attribute value of tx-id in addition to all the relevant transaction info. Am I shooting myself in the leg by multiplying the total amount documents in crux (from millions to x * millions) and should I instead persist transaction information in another system? Also, the only way to query why/who has made the latest transaction would be to add a link from the actual document to the transaction document or the other way around? There isn't a wait to crux/q from a documents tx to another document (that would have a db/id of tx-id)? I would be getting the tx-id from entity-history or entity-tx and then using it to find the correct tx document?

jarohen08:08:46

Hey @UH9091BLY 👋 This seems quite reasonable - the meta information for the transaction can be stored in a single document, linking to others if required. Crux doesn't mind either way, but aesthetically I'd probably wouldn't put meta info in the 'real' documents 🙂 In order to put both the documents and the meta document in a single transaction, you can set up a small transaction function to put the tx-id on your meta docs:

[:crux.tx/put {:crux.db/id :put-with-tx-id
               :crux.db/fn '(fn [ctx doc]
                              [[:crux.tx/put (merge doc (crux.api/indexing-tx ctx))]])}]
then, to submit a meta doc:
[:crux.tx/fn :put-with-tx-id {:crux.db/id (UUID/randomUUID)
                              :originator "james"}]

🙏 7
jarohen08:08:26

finally, to find all the transactions from a given originator:

(crux/q db
        '{:find [(pull ?meta-doc [*])]
          :where [[?meta-doc :originator "james"]]})

refset09:08:02

Another option, instead of using a transaction function, is to lookup the tx-id outside of the query (or inside the query via a custom fn predicate), using entity-tx or entity-history on the meta doc

Tuomas10:08:17

So for a given valid time entity you get tx with entity-tx . For all version entity-history . And if you have a tx list and you need to find the ops you need open-tx-log with after-tx-id of tx-id minus 1?

(defn tx-ops [node tx-id]
  (with-open [tx-log (crux/open-tx-log node (- tx-id 1) {:with-docs? true})]
    (let [iterator (iterator-seq tx-log)]
      (:crux.api/tx-ops (first iterator)))))

(let [{:crux.tx/keys [tx-id]} (crux/entity-tx (crux/db node) :eid)]
  (tx-ops node tx-id))

refset11:08:17

that could work yep, although be aware that open-tx-log isn't really designed to be on the "hot path" for ad-hoc lookups (e.g. Kafka wouldn't be happy), so it really depends on your appetite for duplication vs read performance. You could also manually store references to each of the entities in the transaction in your meta doc and just know "this document was changed" (but you still wouldn't know if it was a put delete match etc. that cause the change without looking at open-tx-log), unless you go further and encode the ops into the meta doc also...but the transaction function suggestion is probably easiest at that point 🙂

👍 2
Tuomas06:08:11

Another question while Im at it. If there is a concept of a user, then there could be a document like

{:crux.db/id :user1
 :login/email ""
 :login/password "---"}
And if there is also a concept of a person that isn't a user it is natural to have a document like
{:crux.db/id :person1
 :contact/email ""}
Now suppose that :person1 registers as a user and we can make the connection between the register event and :person1 , should you enrich :person1 or create :user2
{:crux.db/id :person1
 :contact/email ""
 :login/email ""
 :login/password "---"}
;; vs
{:crux.db/id :person1
 :contact/email ""}
{:crux.db/id :user2
 :login/email ""
 :login/password "---"
 :login/person :person1}
I initially went for the creation of :user2 for more granular :crux.tx/evict (each "entity" has multiple documents based on which sets of attributes you would evict together) but then I realised you could use crux/entity-history and valid time transactions to have the effect of evicting only the necessary attributes from valid time, so I could merge all documents and still have granular evictions

jarohen09:08:36

It's a bit of a vague answer, but 'it depends' 🙂 First, I'm assuming you're evicting these users because of a legal request (GDPR/CCPA/etc) - if not, it's usually more straightforward to delete rather than evict. If it is a legal request, I suspect you'll have to evict any entities containing that personally identifiable information (PII), so it may make more sense (depending on your use case) to extract the PII into a separate, single document - maybe :person is suitable for this? at which point, you could have a reference to the :person from the :user (assuming every user has an associated person). Personal preference, I'd say! (obv I'm not a lawyer, this isn't legal advice, etc 🙂)

jarohen09:08:34

Just to clarify on the 'valid time transactions' - I'm admittedly not quite sure what you mean here, I'm assuming that you mean something along the lines of finding every update in a given entity's history and putting through a transaction updating each version? If so, this isn't quite 'evicting', because you can still retrieve this data by going back in transaction time. e.g. let's say you have a entity that you create at T3, update at T5 and T8 - if you were to put through a transaction at T10 saying 'set the T3 version of this entity to be X, T5 version to be Y, T8 to be Z', and then queried at any valid time, you wouldn't see the PII - but you would if you queried at transaction time T9, say. It means that whatever a user transacts ('eviction' aside, see below), you cannot hide what happened in the past, which is great for audit. Trouble is, this doesn't satisfy GDPR (for example) which says that such data cannot be retrievable. Crux's eviction, OTOH, does remove data from the transaction timeline. If you evict an entity in a transaction at T10, it's not just that that entity now doesn't exist (like a delete), it never existed - if I again query at T9, I now won't see that entity, even though it was there before. (This is why bitemporality is both necessary and awesome, IMO 🤓)

Tuomas10:08:12

What I meant was I think I could get document history with-docs, select non-PII from every version, evict the document and submit a transaction for every version with correct valid time. Essentially this should work as a way to remove specific keys from valid time (because others are submitted again) and everything from tx time

Tuomas10:08:00

This only as a work around of not being able to evict only specific parts of a document

jarohen10:08:14

Ah, I see - yes, in Crux ingest everything works at the granularity of documents. We do have a known issue regarding evicting and re-putting the exact same document in quick succession, but given this will have different content you should be fine - let us know if you have any problems 🙂

👍 2
richiardiandrea20:08:23

A question, I want to confirm my assumption is right: if I have the following ops in order

[[:match ...]
 [:put ...]
 [:match ...]
 [:put ...]]
If the second match fails, it is going to fail the whole transaction and I won't see the put , correct?

refset21:08:30

yep, that's right 🙂

❤️ 2