Fork me on GitHub
#xtdb
<
2022-07-24
>
Prashant07:07:48

Hi, I have just started looking into XTDB as an option for durable storage for a microservice in payments system. This microservice is going to have a throughput almost about 334 writes/second and 1002 reads/second during peak hours (8 AM - 11AM, 1PM-3PM, 5 PM-11PM). I have already looked into Datomic. However due to the closed source nature, dependency on AWS ecosystem (Dynamo DB/S3 for high write performance) and the philosophy of throw in more cores at transactor and it would just work (vertical scaling), made me look into other options. Questions regarding XTDB: • How would a highly available XTDB cluster look like from Infrastructure POV? ◦ Can RocksDB instances be deployed on separate VMs/container/pod? Can XTDB use these Separate remote RocksDB instances for storing indices? ▪︎ Ah! looks like this may not possible just yet per this Slack https://clojurians.slack.com/archives/CG3AM2F7V/p1658136984079849?thread_ts=1658131316.519609&amp;cid=CG3AM2F7V. I guess then using a flash based networked storage volume with high IOps should be the other option for index storage volume? ◦ Would it be ok to introduce a Loadbalancer cum proxy (e.g. Envoy) to healthcheck/delegate requests to an instance of XTDB? ▪︎ It is also an intention to introduce app level sharding using Envoy e.g. • Envoy chooses which app deployment should get the request using certain hashing algorithm • Every app-deployment (or a group of deployments) has it's own XTDB instance to talk to ◦ To make these individual XTDB systems highly available, what would be the infrastructure layout? • In this case, I assume having own RocksDB volume per XTDB instance would be the best practice? ▪︎ How involved would the process of re-sharding data be (in case later on data is to be re-sharded)? ◦ It was previously intended to use Kafka for both document and log storage. However, after reading this https://docs.xtdb.com/storage/1.21.0/kafka/, I am curious what would be more suited for document storage Kafka or RocksDB for high write throughput for documents? ▪︎ For topics storing transaction log or documents, I assume that the retention period would be =-1 and log.retention.bytes=-1 ? ▪︎ Does XTDB set key while publishing record to Kafka, so that topic compaction can be turned on? ▪︎ While creating Kafka topics, is it advisable to partition them per the throughput caculation?

nivekuil08:07:21

rocksdb is colocated in the same process as the XTDB node. You can also run the XTDB node in the same process as your apps. One challenge with load balancing is that consistency is per-node, so one client needs to hit the same node or you may have inconsistent reads. The XTDB indexes can only be fully replicated per node, so disk space could be a problem. There's no sharding, only full replicas. RocksDB isn't a document store, since it's only a single process embedded kv store

Prashant08:07:10

@U797MAJ8M Thanks for the valuable info. > You can also run the XTDB node in the same process as your apps. I do not intend to run the service with this design. > One challenge with load balancing is that consistency is per-node, so one client needs to hit the same node or you may have inconsistent reads. This is exactly where Envoy and hashing comes into picture. This would ensure that requests for same key go to correct app deployment and thus, the underlying XTDB instance. This would be app level sharding and wouldn't really have much to do with XTDB directly. > RocksDB isn't a document store, since it's only a single process embedded kv store It does say https://docs.xtdb.com/administration/configuring/#_storage that RocksDB can be used as a document store albeit for a single instance node. I was thinking of using a remote volume based on a fast Flash disk with high IOps which can act as durable store volume for RocksDB at least for index store > There's no sharding, only full replicas I guess having at least 1 replica per XTDB instance would ensure HA. I assume that the active instance and it's replica can use the same networked volume for Rocks DB.

xlfe09:07:54

Hi Prashant - happy to talk you through how we've deployed XTDB on GCP (GKE actually)- our HA strategy primarily related to using GCP Datastore for the TX/Doc store as a second level cache (and using Rocks/LMDB for local indexes)

Prashant09:07:40

@U01ENMKTW0J Hi Felix, That would be super helpful since current infra is also on GCP. Since there's a Kafka Cluster deployed already and the intention is to use it for document and tx-log storage. However, I am pretty sure there would be a lot of concepts that I can reuse from your design.

nivekuil12:07:59

I don't think you can share rocksdb over network like that, since it gets locked by a node, though not sure if it applies to the doc store. I would think kafka would perform and scale better in any case, since it caches documents locally

refset10:07:51

Hey @UKVBQ7SQG apologies for the delayed response. Firstly, if you would like to talk any of this through on a call sometime soon, I'd be very happy to assist (just send a calendar invitation to <mailto:[email protected]|[email protected]> and we can take it from there!). > using a flash based networked storage volume with high IOps should be the other option for index storage volume That is probably the right choice. Physically local disk may give marginally higher performance, but common networks can be pretty fast compared to flash IO these days. Your app level sharding description sounds reasonable enough to me, although I've not personally attempted to create something that sophisticated. > To make these individual XTDB systems highly available, what would be the infrastructure layout? At a minimum, HA would require two XTs nodes in isolated zones. The tx-log and doc-store would also need to be HA similarly from the perspective of these two zones also. > I assume having own RocksDB volume per XTDB instance would be the best practice? Yep, a fully embedded KV store per node is the only supported way to run XTDB today. Whilst a remote KV store is technically possible (e.g. Redis), the code doesn't really account for that possibility and we've not attempted to build such an integration (mainly because we know the query performance will be far less than ideal). > How involved would the process of re-sharding data be (in case later on data is to be re-sharded)? It sounds like you would want to cut over to fresh tx-log and doc-store instances in this case. XT doesn't offer any facilities in particular to help with this, but shouldn't get in the way either. > I am curious what would be more suited for document storage Kafka or RocksDB for high write throughput for documents As outlined already on this thread, you can't really use RocksDB from multiple nodes, even if some of those nodes are being treated as replicas. However, the Kafka document store is actually designed around a RocksDB "local-document-store" anyway, since we need ad-hoc KV access to the contents of the log. In this sense you can think of the local-document-store as just an extension of the index-store (i.e. 1 full replica per node, no sharing). Note that by default the local-document-store will be in-memory, but you almost always want to configure it to be RocksDB. Again, happy to help however I can, it sounds like an interesting project 🙂

refset10:07:42

> you can't really use RocksDB from multiple nodes Technically there is some scope for this with https://github.com/facebook/rocksdb/wiki/Read-only-and-Secondary-instances but again, not something we can support currently - I don't mind chatting about it though if it looks appealing(!)

Prashant13:07:35

Thanks a ton Jeremy. I really appreciate you taking time and answering the questions 🙏 I am down with flu right now, but expect an email early next week. Hopefully, I would have recovered from sickness .

👍 1
refset14:07:17

Awh shucks, sorry to hear that...I hope you feel better soon! ☺️

🙏 1