Hi, pondering how to approach a XTDB v1 situation where I might have to evict at least 300 000 documents. 😅 Currently we have eviction done like this • Fetch id's • Partition id's to 1000 document buckets • Create tx operations for each bucket • Run tx operations one bucket at a time What we have noticed is that this is quite heavy operation and in production we have seen that it can block writes resulting in errors. Are there any other approaches to this particular problem? This is not frequent and usually the reason is customer requirement as they want us to update their data significantly. I can tackle this by timing it well enough (like, weekend night when acitivity is low), but as our customer base grows I expect this to become more frequent. Especially would like to know if the id fetching could be streamed or "windowed" somehow to avoid huge spike in CPU/Mem usage.
One observation more, the checkpoint restoring is taking "forever" after the eviction of those documents (forever from Cloud Run perspective at least) causing errors as the startup probe fails. I've had to increase the startup probe tolerance quite a lot.
"forever" means over 20 minutes easily. This is a small instance (2 CPU, 8GB), so if the startup probe tolerance cannot be raised high enough, I may have to double the container specs.
Note for others 😎 If you are running XTDBv1 on Google Cloud Run and you encounter similar issua as we did (evicting 566 000 documents) that result in HUGE checkpoint update and Cloud Run startup probe thresholds are not sufficient to allow checkpoint consuming before startup probe fails and force instance restart 1. https://v1-docs.xtdb.com/administration/checkpointing/#_note_about_using_filesystem_checkpoint_store 2. change the http startup probe to tcp 3. tolerate unresponsive database for a while 4. profit
Sidenote, latest checkpoint contains 113 files where ~100 are of size 67MB 👀
@taylor.jeremydavid if I'm not interested to store all those 566k [::xt/evict (first id)] queries in checkpoints in order to hasten the startup of a new node/revision, how should one proceed? Can I just delete stuff 🙈
We have a single node using XTDB and basically checkpoints are needed only when we deploy new revision with new features into pilot/prod. And due to changes in our infra we have seen drastic change in query frequencies and the query performance is not that critical to us anymore. I just started to think that do we actually need checkpointing when our query patterns are random and infrequent (XTDB is the golden storage with versioning and for most use cases we just export the data to other data processors).
> Can I just delete stuff 🙈 you mean from the Postgres table(s) directly, via some one-off SQL statements? That can work, yep. Just be careful 🙂 The checkpoint stores the latest offset, so if you only delete entries that are greater than that offset, the indexer will skip to the next greatest entry without realising anything is missing
Oh no, that sounds way too scary 😅
Anyway, the production API's are up and running and new data is constantly flowing in
So the checkpoints have new content after the evictions
What would be the effect if I disable checkpointing completely and deploy a new node at this point of time? Would the next deployed instance need to index the whole history of all documents before being usable?
> Would the next deployed instance need to index the whole history of all documents before being usable? Essentially yes, checkpoints are just a kind of cache
Could I expedite the checkpoint "vanishing" by introducing retention policies?
I mean, I just want to somehow get rid of the large checkpoints so that I can stop worrying about the startup probes failing 😂
Pruning older checkpoints manually is safe, but yes retention policies are the automatic approach, configurable since https://github.com/xtdb/xtdb/pull/2591
So I could just remove files from a checkpoint?
I mean from the latest checkpoint or does that cause havoc in indexing?
Ah you mean file within RocksDB - I wouldn't recommend deleting any files directly, Rocks spreads data throughout them and does integrity checking. But you can evict (delete) kv entries and then run compaction
You're kind of fighting XT's state replication at this point, but if you're careful and only have one active node, and backups, you can be quite confident about the impacts
Yeah, maybe I'll just implement an aggressive retention policy, wait for it to do it's job and then restore a more lenient version. This is not a production blocker, I can use tcp startup probe to overcome limitations in GCP start up probes max timeouts
FWIW streaming allows us to maintain normal operations during the eviction, but eviction of 200 documents takes on average 40 seconds meaning 5 ops/sec. This means evicting 283 000 documents takes approx. 15h 😅
(disclaimer: the prod instances are not very large, this could be faster with larger instances)
Interesting. Thanks for following up. Is it still causing a pain then?
No, I truly hope this is a unique situation. Integrator pushed data to s single owning entity where he should have pushed then to at least 6 different owners, thus we needed to evict the docs to re-push the data to correct owning entities. We pondered to programmatically move those documents but chose to clean the slate just to be sure data is where it belongs. Integrator couldn't provide certain enough identification to be 100% sure of the programmatical change of ownership
Deploying now a version with 1 minute retention time to our pilot system since can't see any signals of clearing checkpoints. Checkpoints are uploaded normally
Just for the sake of completeness, this is our kv-storage config now.
(defn kv-store [dir]
{:kv-store (merge {:xtdb/module 'xtdb.rocksdb/->kv-store
:db-dir (io/file dir)
:sync? false
:enable-filters? true
:block-cache {:xtdb/module 'xtdb.rocksdb/->lru-block-cache
:cache-size (* (:xtdb-cache-size-mb config) 1024 1024)}}
(when (:xtdb-checkpoint-bucket config)
{:checkpointer {:xtdb/module 'xtdb.checkpoint/->checkpointer
:store {:xtdb/module 'xtdb.google.cloud-storage/->checkpoint-store
:path "/mnt/xtdb-checkpoints"}
:approx-frequency (java.time.Duration/ofHours 1)
:retention-policy {:retain-newer-than (java.time.Duration/ofMinutes 1)
:retain-at-least 5}}}))})
It does not seem to run clearing at all. Below some recent logging from our GCP instance
2026-01-14 10:25:20.171
time="14/01/2026 08:25:20.171529" severity=INFO message="File system has been successfully mounted." mount-id=pilot-xtdb-checkpoints-5595e368
2026-01-14 10:25:29.413
2026-01-14 08:25:29,411 [main] INFO xtdb.checkpoint - restoring from {:xtdb.checkpoint/cp-format {:index-version 22, :xtdb.rocksdb/version "7"}, :tx {:xtdb.api/tx-time #inst "2025-08-06T10:49:26.697-00:00", :xtdb.api/tx-id 2219256}, :xtdb.checkpoint/cp-path #object[sun.nio.fs.UnixPath 0x1b6b605 "/mnt/xtdb-checkpoints/checkpoint-2219256-2026-01-14T08:10:23.274-00:00"], :xtdb.checkpoint/checkpoint-at #inst "2026-01-14T08:10:23.274-00:00"} to rocksdb/index-store-for-postgres-pg-67368c0-carbonlink-pilot.f.aivencloud.com-carbonlink-test
2026-01-14 10:25:35.980
2026-01-14 08:25:35,978 [main] INFO xtdb.tx - Started tx-ingester
2026-01-14 10:35:20.173
time="14/01/2026 08:35:20.170624" severity=INFO message="Starting a garbage collection run." mount-id=pilot-xtdb-checkpoints-5595e368
2026-01-14 10:35:20.207
time="14/01/2026 08:35:20.205627" severity=INFO message="Garbage collection succeeded after deleted 0 objects in 34.887143ms." mount-id=pilot-xtdb-checkpoints-5595e368
Showing logs for last 2 days from 1/12/26, 10:35 AM to 1/14/26, 10:35 AM.
When looking at https://github.com/xtdb/xtdb/blob/1.x/core/src/xtdb/checkpoint.clj#L80
(defn checkpoint [{:keys [dir bus src store ::cp-format approx-frequency] :as checkpoint-opts}]
and then https://github.com/xtdb/xtdb/blob/1.x/core/src/xtdb/checkpoint.clj#L69C1-L69C76
(defn apply-retention-policy [{:keys [store ::cp-format retention-policy]}]
Am I correct to say that the retention arguments are not passed?:keys does not pickup :retention-policy keys?
AFAICT the config should pass through okay
just a thought: does the service account have delete permissions on the bucket?
It should.
I'm looking for this logging in my service logs, it should be there despite the permissions. https://github.com/xtdb/xtdb/blob/1.x/core/src/xtdb/checkpoint.clj#L77
(log/infof "Clearing up old checkpoint, %s, based on `retention-policy`" checkpoint-opts)
But it never shows up, and above it is a check
(when retention-policy
which is why I confused myself about those config opts being passed 🙈if you have repl access you could see what available-checkpoints and calculate-deleteable-checkpoints return
and I guess if you have repl access you could also monkey patch some more logging
I can probably carve out some time tomorrow to try to repro it with a vanilla setup in GCP
Nope, we don't have repl access to prod.
Hmm, deployed this config yesterday. Still cannot see any logging regarding clearing checkpopints. How could I verify that it is effective? Should I just wait a tad longer for logging to appear?
:retention-policy {:retain-newer-than (java.time.Duration/ofDays 1)
:retain-at-least 1}
Hmm, yes worth waiting 2 days at most, but if it's not done anything after 1.5 days it's very likely not working. It would be good to verify the config separately. If you have a dev environment you can turn down all the numbers to experiment at the scale of minutes
If needed, yes I have and can 🙂
Hey @jussi.mononen firstly, is this eviction being done for GDPR-type reasons? Are you using RocksDB? If not you might well be running into this outstanding issue: https://github.com/xtdb/xtdb/issues/1509
RocksDB for index strore, PostgreSQL for everything else.
The main reason is that customer wants to move from "one gargantuan chunk of data" to "lots of data per company in our concern structure". Ie, they want to split the data to the respective subsidiaries instead of treating it as one big chunk.
Hmm, rather than evicting you might be better to 'decant' into a fresh database. Bulk eviction is not something we've optimised particularly, so there may be some low hanging performance gains, but it's hard to say if that will help you in any case. You can definitely at least stream the id fetching, see open-q
One instance of XTDB is shared with multiple customers, so can't easily decant them 😕 The streaming could be sufficient if it allows concurrent writes after the change, since evicting is rare and a background process that can run on its own for as long as it takes. Thanks!
Okay, well let me know if your background batching/partitioning approach is still a struggle and I'll give this more thought next week