Fork me on GitHub
#xtdb
<
2022-07-18
>
Martin08:07:56

Hi 👋 , this is maybe something I missed in the docs but I would like to understand the limits of XTDB. So the tx-store and doc-store are pretty much bound to the limits of Kafka/S3 but it is not clear to me what are the limits of the index-store in RocksDB. Let’s my index becomes multi-terrabyte after a couple of years of injesting 100s tx/sec is there a way to evict old history from the index to make space? Is this a legitimate concern? Maybe I’ve just missed something in the docs, basically I like to know when and how XTDB breaks at scale and how to handle that?

refset09:07:24

Hey @UBDQU4NRL blob-wave A single RocksDB instance can definitely scale to multi-TBs, but given the overheads of scaling vertically like that indefinitely, it's usually sensible to think about how you might shard your data at the application level to scale across multiple independent XT instances. Performance of any kind of single node design will eventually begin to degrade as it fills up, but RocksDB itself should be the bottleneck rather than XT. What kind of data are you handling? I would be happy to help you with some benchmarking and extrapolation. > is there a way to evict old history from the index to make space? Not currently, although it is theoretically possible to prune/truncate the history safely, as per https://github.com/xtdb/xtdb/pull/727 https://github.com/xtdb/xtdb/pull/734

Martin09:07:39

So this came up in the context of building some sort of General Ledger system, I’m comfortable running multi-node system using Kafka for the tx-log and document-store but my concern is what happens when the index-store becomes massive. It was not obvious if the index store had to keep a full index of all historical transactions in it or if there was way to compact the index to control the size of ssd required for each node in the cluster.

refset09:07:27

I see, thanks for the added context. > It was not obvious if the index store had to keep a full index of all historical transactions in it or if there was way to compact the index to control the size of ssd required for each node in the cluster. I think its fair to say that the out of the box experience is ~100% immutable 🙂 Our roadmap plans will relax/eliminate the need to have large local SSDs for each node though, see https://xtdb.com/blog/dev-diary-may-22/#:~:text=pillars%20of%20XTDB.-,Pillar%20%231%3A%20SoSaC,-If%20we%20hope - I'd be very happy to speak on a call if you'd be keen to chat about all this.

Martin09:07:36

I guess want I’m interested in is how do the different storage layers scale relative to each other? If my transaction-log is 1TB of data in Kafka, what order of magnitude will the index-store be? 100GB, 10GB, 1TB

Martin09:07:24

> Pillar #1: SoSaC > If we hope to build the world’s most-loved immutable database, it has to do immutability very well, and we firmly believe that “doing immutability well” implies the Separation of Storage and Compute (we recently recorded a https://www.youtube.com/watch?v=8tePLbzJffI all about this concept — see below). However currently, as of 1.21.0, XTDB’s operational storage layer (a.k.a. the Index Store) requires the underlying Key-Value store (e.g. RocksDB) to exist on every node with a full replica of almost all data. In the future, XTDB will reorient around cloud object storage and handling data in large columnar ‘chunks’, to allow nodes to retrieve and process on-demand only the subset of chunks needed for a given set of queries. This will enable much simpler and more cost-effective horizontal scaling.

refset10:07:34

The tx-log should really always be the smallest of the three components, since it only contains document hashes and tx operations. Therefore a 1TB tx-log would imply an absolutely massive system 🙂 The ratio between the doc-store and the index-store is very workload dependent, based on the average size/width/depth of individual documents, and how often the same key-value combinations appear across the data set. The existing doc-store modules don't attempt any structural sharing, whereas the index-store deduplicates as much as possible (to optimise for SSD usage). A ballpark "typical CRUD application" ratio might be 1:5:10 (tx-log:doc-store:index-store)

Martin10:07:45

Okay, interesting so each xtdb node has to support a full copy of the database. I always thought of the XTDB nodes as compute nodes with the majority storage being pushed off too the doc-store/tx-log but the XTDB node is more like both compute and storage with the tx-log/doc-store there to support HA/replication.

refset11:07:47

> so each xtdb node has to support a full copy of the database That's approximately correct. The only real exception is that for certain ~large values (e.g. nested structures or values >224 bytes), XT will only index a hash of the value, so may require ad-hoc doc-store retrievals to resolve these values while querying (there's also a configurable in-memory cache for these lookups). > XTDB node is more like both compute and storage with the tx-log/doc-store there to support HA/replication Exactly this, yep. The tx-log also provides low-latency commits with an authoritative tx-time.

🙏 1