xtdb

bobcalco 2025-11-29T19:07:23.045939Z

What is your general/best advice for adding full text search to an XTDB2 db?

➕ 1
refset 2025-12-01T20:29:37.208959Z

For integration we have the start of a CDC feature, via a Kafka topic, essentially a totally decoupled transaction log you can read from (and index from) without polling: https://github.com/xtdb/xtdb/pull/4986 More to follow on that in the coming weeks as we're building out this story. This is probably the best route for the time being. Although the real answer is let's have a chat, maybe we can figure out a plan for something more native 🙂

bobcalco 2025-11-29T20:22:45.514979Z

With XTDB v1 remember there was a kind of sweet spot of about 1000 insertions per transaction. Are there any metrics for batch updating data in XTDB v2 of that kind? Any known limits or sweet-spots for tuning large batch transactions? Both when pre-loading data (like in a migration) or when under heavy load.

refset 2025-12-01T20:32:03.515359Z

It rather depends on the width and depth of the rows also, but in general 1000 is still our go-to batch size. We have been doing pretty regular analysis on the ingestion pipeline over the past few months, mainly with loading TPC-H data, so the guidance still stands.

refset 2025-12-01T20:36:46.840189Z

For ~peak batch throughput insertion you might want to look at Arrow-based COPY which just landed: https://github.com/xtdb/xtdb/commit/b11f24383d37142b57c75d7cbbbedaf0bd8d2e01#diff-817cf8170a6c123b8973ffc87115d13ba84fa3aa773893b413f403f9451c0997R102-R113 Because it avoids all nearly all serde overheads, we observed about 200k rows/sec with TPC-H, over the Postgres wire protocol (!) The default submit-tx process will be round-tripping via transit, so you end up paying for a lot of allocations at both ends. Getting your data into Arrow may not be free, but you can perhaps parallelize that. How much data are you looking at for a migration?