Any ideas how to improve eviction performance? I have a situation where I may end up deleting > 10k documents while in development and eviction of a single document takes roughly 0.4 seconds, which is 6.5 minutes for 1000 documents. (PostreSQL, XTDBv1)
Hmm, this seems to be directly proportional to the history size of the document in question.
Hey @jussi.mononen this is possibly related to this known issue https://github.com/xtdb/xtdb/issues/1509 - although you're using RocksDB, right? Are you using transaction functions in these transactions?
Yes to both RockDB and tx.
(defn tx-ops-succeeded?! [node ops]
(let [tx (xt/submit-tx node ops)]
(xt/await-tx node tx)
(xt/tx-committed? node tx)))
(defn throw-if-ops-fail! [node ops]
(when-not (tx-ops-succeeded?! node ops)
(throw (ex-info "Transaction failed" {:type ::transaction-failed}))))
And then I might have several thousands of
[::xt/evict id]
operations executed with throw-if-ops-fail!is it making the dev flow unviable as-is? I could potentially look into this soon if it's preventing you moving forward
It's kind of annoying, I usually defer these operations to the evening and just leave my processes running for the night 🙂
What to do if (xt/sync node) takes ages? Fresh container, first start and very large existing database? We are using checkpoints.
hey @jussi.mononen 👋 depending on how long it's taking, one quick fix might be to increase the checkpoint frequency
otherwise, how's your indexing speed normally?
1 hour
if this is the same env as the previous thread, then it might be that the significant volume of evictions is slowing your indexing down over time
Its the same env yes.
One of our containers is having hard time to start due to the long sync and startup probes fail
eviction is really only intended for cases where you're legally obliged to erase data, the assumption being it's a small fraction of your overall data - it's not optimised for anything beyond that
Yeah, I'm aware of that, we have a case where the original data has to be replaced completely and derivate documents as well
ie. we noticed that one integration provided incorrect data and when our API is idempotent the base data needs to be evicted before data with the same ids can be inserted 🙈
ah, ok, I see 👍 🤔
would a cross-time delete be sufficient, rather than an evict? those are likely to be much quicker
Maybe.
Regarding the slow starutp sync, are there ways to exepdite it? Does it read all chekcpoints or only the latest? Our latest checkpoint contains 81 files of which roughly 70 .sst files are 68MB each
it'll only read the latest, yep
280 secs it took for a completely clean instance to reconstruct local indexes from prod db
Aaaaand that's the issue with our container, Google limits startup probe's maximum time to 240 seconds
Does the time it takes to sync grow linearly with the amount of data?
shouldn't do - beyond the initial checkpoint download, it should be proportional to the data entered since the last checkpoint
👍🏻 good 😅
So I guess our hassle can be traced back to those numerous evictions done 🙈
and checkpointing not getting those updates fast enough due to the one hour update cycle
ah okay, so those evictions aren't just a development-time problem for you... (they're in the prod data, if I'm understanding correctly?). If you're simply not able to workaround the startup limit (i.e. you're totally blocked) I can take a look at the eviction code and have a proper think soon
well, it escalated quickly from dev time issue to prod issue since our container decided to dump itself and it took a bit of time to figure out the start probe timeout was the culprit, not the time spent in syncing 😅
totally unrelated but the container that went bust couldn't be shutdown since the new instance version couldn't satisfy the startup probe 🌪️
(that's managed cloud services for you 😂 )