xtdb 2022-05-24 | Slack Archive

Is it ok to block and do some processing in a tx function?

from an XT point-of-view, yes - but bear in mind that XT has a single writer thread, so other transactions going through will be waiting behind it in the queue

jarohen14:05:15

depending on your required throughput, this may not be an issue 🙂

Martynas Maciulevičius14:05:00

I'm think about that previous thread of mine: https://clojurians.slack.com/archives/CG3AM2F7V/p1653313348030259 If I'd have a second in-memory-only instance of XTDB I could write to it from a tx-fn. And that probably shouldn't be too hard on the processing. I.e. the configuration would be: main instance that is driven by kafka MQ and then one more instance that is driven by outside and a tx-fn of the first XTDB instance. I'm not yet sure how to sync both DBs so that I'd get consistency. I hope I won't need to use Raft for this and one of them could simply be slave-like xD

Martynas Maciulevičius15:05:14

I thought about it more and now I have a different question. Would it be possible to have a reentrant lock that would lock the XTDB from commiting new states and prevent evaluation of transaction functions? I think about synchronizing two XTDB in-memory instances and I'm not sure how else I could do it.

jarohen15:05:37

whew. it might be worth you taking a look at our (incredibly alpha) secondary indices API instead

👀 2

jarohen15:05:44

about the only documentation we have for that is in the 1.18.0 https://github.com/xtdb/xtdb/releases/tag/1.18.0 and a https://github.com/xtdb/xtdb/blob/master/modules/lucene/src/xtdb/lucene.clj#L386-L399

Martynas Maciulevičius15:05:06

What I already implemented is an on-demand creation of ephemeral entities based on queries of clients. So I wanted to save them into a non-MQ-managed place (our main data will come in via kafka) and I think it could be a second in-mem XTDB instance. So what I wanted to do is to be able to skip the MQ and insert the docs immediately. And then let the transaction function clean up if it needs to. I'll try to take a look.

jarohen15:05:10

intention behind that is to be able to react to transactions going through XT and update (in this case) a Lucene index - XT ensures that secondary indices are kept up-to-date with the main transaction ingestion.

jarohen15:05:12

again, XT waits for the secondary indices to have all updated themselves before it continues indexing the next transaction, so worth not putting too much work in there if that's a concern

Martynas Maciulevičius17:05:28

Hm. This seems interesting and it may work. I want to understand what it does. Does this line get all transactions that were performed since the last index update (configured with :refresh-frequency)? https://github.com/xtdb/xtdb/blob/master/modules/lucene/src/xtdb/lucene.clj#L391 What about current document versions? Would it be possible to obtain the current DB state or the current versions of the updated docs?

Hukka05:05:23

The transaction functions might be run many times, on different nodes, so bear that in mind when thinking what kind of processing you do, especially if there are side-effects

Martynas Maciulevičius05:05:52

I only do some queries on the current XTDB instance. And jarohen proposed something completely different that may suit me better. In particular I would want to basically run "in-between of tx functions". And I think that his proposal could work.

tatut07:05:44

a side note, I would be very interested in "secondary indices API" as I've been thinking about that as well, you could have geo indices, and all sorts of auxiliary things that were first class citizens

Martynas Maciulevičius08:05:19

@U050V1N74 Is it safe to query the current xtdb instance from secondary index fn? I want to find the not-yet-changed versions of the docs and compare them to the txs that come in.

jarohen08:05:38

just fwiw: context of the secondary indices 'api' is that it was introduced as an implementation detail of the Lucene module (to decouple it from the core), I announced it in the release notes as an aside - so it's been quite a surprise that folks might find it useful 😅

Martynas Maciulevičius08:05:08

Well it's not a context that I would want but I only want to run queries. So if there wouldn't be context I would run that code after every tx (which I will probably do anyway) and then have a reference to the DB itself. So what I expect is that db would behave the same as in a tx function (I won't write but I want to read) (and if there won't be db in the context I'll get it from the main reference). And processed tx functions is also what I expect to get there. So I see this as a more extended and customized version of a transaction function :thinking_face: And its main feature would be to allow mutation of outside-world. i.e. tx-fn that allows mutation of real world (which is actually what you do with lucene index file)

jarohen09:05:38

@U028ART884X neither are currently accessible to the secondary indices API I'm afraid. you could, at a push, get hold of the before state of the DB (the transaction hasn't been durably committed at that point) - I could provide an example of that if it'd be useful

jarohen09:05:59

but as I say, not something we intended to be publicly useful, so both the usability and stability of the API will reflect that 🙂

Martynas Maciulevičius09:05:03

Thanks. I think it should be quite straightforward. I think I'd need to pass node to the registering function at the index registration time and then it would simply take the newest version or the one that would be supplied in the :put valid-time field. Would it be different from this?

jarohen09:05:45

trouble is that it'd be a circular dependency: node -> tx ingester -> secondary indices -> node

Martynas Maciulevičius09:05:46

I'll use the node only for querying. Not updating. For updating I'd use regular tx functions. And then I'll have a second xtdb node that will work similarly as lucene's file. Only this time I'll save docs into there and won't react to them. Ephemeral instance. And if I understand correctly the listener function is mutable. i.e. it mutates an atom.

jarohen09:05:07

oh, of course, sorry, you have two nodes in this scenario 🙂

jarohen09:05:50

you might not need tx-fns in that case, if you're the only thread updating the second in-memory node

✅ 1

mhuebert20:05:30

Is there any prior art for getting something like datomic's tx-report-queue out of xtdb - like a stream of e-a-v changes? I know xt is document-oriented but given all the indexing I’d think these comparisons (which attributes of each doc have changed) are being performed already somewhere

Martynas Maciulevičius04:05:45

Start with this one: xtdb.api/listen Then depending on your needs try fiddling with transaction functions. And depending on your needs you may want to use indices which were suggested to me yesterday: https://clojurians.slack.com/archives/CG3AM2F7V/p1653407197541959?thread_ts=1653401882.743239&cid=CG3AM2F7V What I did was to compute these changes myself in a tx-fn.

jarohen08:05:03

listen is a good shout, although it'll only give you new transactions (which might be all you need) - if you need transactions that have already been ingested, you can use open-tx-log instead, take a db out at each transaction-id, and look at the entity-history of each of the documents in the transaction?

mhuebert21:05:56

@U028ART884X @U050V1N74 thanks for the pointers!

mhuebert21:05:38

hmm so I don’t see anything at an e-a-v level available - seems like best case is using entity history, & comparing the last two versions of each changed entity to see which top-level attributes changed in the last tx

Martynas Maciulevičius22:05:54

Nope, XTDB doesn't work on EAV level. Only doc level. I wanted to do some more complicated time-travel things but it doesn't work with EAVs and what I wanted to do is not doable without storing meta-attrs for each attr (and if I would somehow store it then it would also take actual storage). But you can do this feature that you want at least.

mhuebert09:05:16

ok. I was hoping that because XTDB does maintain indexes for all the toplevel attributes, that there might be some output from the indexing step that could indicate which toplevel attributes changed.

Martynas Maciulevičius09:05:27

You can know which attributes changed by going into doc's history -- this is easy. But if you want to update a document you must update it as a whole, not as a single EAV. I.e. you can't remerge docs in the future (I wanted that but it has some cornercases that I couldn't overcome (docs respawn, attributes respawn -- every update is whole doc, not EAV, so you can't know if you created an attribute or carried it over (so the DB is fine, it's just that I can't merge docs in future/past and it's because I know too little about attrs))). Every write into a DB is either a put or evict. And delete is a put which puts in a nil. There is no create which would fail if doc exists. You could implement create tx function but DB would erase it as you'd need to return put from it. Because DB only accepts put calls and in doc history you consume only document states (or nils). You could store the operation with the doc but I didn't want to do that. It could probably still work. But other than that I can't complain.

2022-05-24

Channels