Fork me on GitHub
#xtdb
<
2019-08-30
>
mpenet07:08:11

If I recall correctly every node has to be able to hold the entire kv-store contents, are there any plans for future sharding? (I imagine it's quite tricky given the query results "laziness"/snapshoting needs)

jonpither14:08:42

hi @mpenet there are sharding discussions in the works, particularly for the underlying kv-store holding documents.

mpenet14:08:15

oh, on github or internal for now?

mpenet14:08:16

I am starting to actually use crux on a personal project, really having fun so far

🙂 8
refset15:08:46

Sharding discussions have been internal for the most part because we aren't ready to commit to anything and don't want to get anyone too excited. The short answer is that have mapped out the spectrum of what could be done to shard with the current design, but we are also looking at alternative approaches like segment based indexes, k2-Trees and natively distributed KV stores such as: https://github.com/rockset/rocksdb-cloud

purrgrammer11:09:11

have you considered something similar to Datomic, where the indexing happens in the Transactor and indexes are written to shared storage that can be (lazily) accessed by nodes/peers?

refset12:09:11

Hi, yep we have definitely pondered how to gain the benefits of the Datomic transactor/indexing approach, you can read more about the trade-offs here: https://juxt.pro/crux/docs/faq.html#_comparisons -- the current Crux architecture, where every node validates transactions and performs indexing, was the easiest thing to build initially but we have much grander ambitions for the future

♥️ 4
refset15:08:45

It would be great to talk more about your use-cases and what kinds of sharding constraints you would be able to tolerate. Certainly if you have a project in mind we are very open to collaboration and joined-up planning!

mpenet16:08:33

I am just toying with that project at the moment. The sharding issue is just something I was wondering about, so far it seems like a weakness crux has for scaling horizontally with very large datasets, but I guess it's not a big concern for now (alpha stage)

refset16:08:05

Ah okay, good to know. We haven't tested at vast scale yet, but Rocks is definitely good for a few TBs https://github.com/facebook/rocksdb/wiki/Performance-Benchmarks