xtdb 2020-11-13 | Slack Archive

markaddleman17:11:55

I'm curious about a crux architecture that would support multitenancy. Fundamentally, we could choose to have a single graph with all tenants or separate graphs. At peaks, we'll ingest between 10 and 100 docs per second for each tenant, so I imagine that we'll need multiple graphs for good query performance - unless multi-attribute or compound indices are in the plan. For the multiple graph approach, looks like Apache Pulsar might be a good solution for streaming ingestion. I imagine the doc store would be pretty straightforward. It's the index store that seems mildly complex. I imagine we'd set up an index node for each tenant - each with its own rocksdb instance. Is that the basic idea? Are there more pieces to consider?

refset18:11:03

> At peaks, we'll ingest between 10 and 100 docs per second for each tenant Yes it sounds like you'll want a separate graph per tenant, else you might risk exceeding the write throughput of a single tx-log/instance. Pulsar would be a nice fit. We've not kicked the tyres on it yet but I'd be happy to assist! > I imagine we'd set up an index node for each tenant - each with its own rocksdb instance. Is that the basic idea? That's pretty much the way to do it. The only caveat is that we wouldn't be able to support querying across multiple tenant instances in a single Datalog query (we're considering it for the roadmap though), so you would need to handle that in the application layer - hopefully that's not too big an issue. I do know of a team using Crux in a heavily multi-tenanted setup (~300 nodes per JVM) and might be able to broker a conversation if you want to swap thoughts/tips. They may even respond here if they see this 🙂

markaddleman18:11:47

This is good news. Thanks!

markaddleman18:11:09

In our situation, every tenant is a different customer so joining across graphs is not likely requirement.

👍 3

markaddleman18:11:04

We're exploring options for data store at this point. Perhaps not surprisingly, it has come down to Crux and Datomic - each of them have different pros and cons.

markaddleman18:11:11

One more thing: The crux-sql will be an important piece to our solution. What are the multitenancy considerations for crux-sql?

refset19:11:25

Cool. Crux-sql should be fine for multi-tenancy, as far as I'm aware. It was a little tricky but we did figure out a way to support multiple instances (via this global registry concept: https://github.com/juxt/crux/blob/master/crux-sql/src/crux/calcite.clj#L40). In theory Calcite - which the module uses under the hood - could support multiple Crux databases with a single SQL query but we've not built it for that use-case. Let me know if you'd like to chat about it sometime next week, I'd be keen to hear more 🙂

👍 3

adamfeldman15:11:27

I’ve been idly thinking about building B2B multi-tenant apps with Crux – I’d never considered that a single JVM could run multiple Crux nodes! I’m so curious to hear more details of that setup. Do you know if each tenant has their own tx-log topic? What sort of storage is used for the index-stores – a single EBS- or PD-like disk, local NVMe, etc?

adamfeldman15:11:47

On Pulsar: what’s been interesting to me is it was always designed for infinite retention, with built-in capability to offload cold data to S3/GCS-like object stores. Confluent Platform (non-free IIUC) has since added similar capabilities. I haven’t tried them yet, but there’s at least one company now offering Pulsar as a managed/supported service: https://kesque.com/

refset14:11:22

> I’d never considered that a single JVM could run multiple Crux nodes! Yeah it's quite a neat trick. Crux isn't too memory hungry so it works okay, although there's a lot we could do to improve such a setup, e.g. shared caches / memory pools The thing that got me excited about Pulsar initially (before I got involved with Crux!) was the native support for large numbers of topics - like a topic-per-user/object/actor 🙂 though I believe Kafka now handles that pretty well too. I can't imagine it will be long before AWS/GCP/Azure offer fully managed Kafka & Pulsar API-compatible services.

mitchelkuijpers10:11:52

@U2845S9KL we also are starting to run Crux with a node per tenant, there are some interesting tuning you can do with RocksDB to share some caches

markaddleman15:11:41

Thanks! Do you have a pointer to the relevant RocksDocs?

2020-11-13

Channels