Fork me on GitHub
#xtdb
<
2023-11-01
>
John18:11:27

hey. I'm not grasping what exactly ends up in the index in the 'peer' nodes and how queries would work efficiently. let's say I have 100GB of bank statements among mere 100 customers with an even distribution of tx amount among them. that means a lot of indexes right ? I'm guessing the answer is yes, and the response is 'have the cache somewhere shared like memcached' and the conclusion is the total is just like a 'traditional' db but with this split with each part in separated boxes, and indexes being in N boxes

refset19:11:32

Hey @U04PF6VSNG0 so the short answer is that each node contains a complete replica of the index. XTDB doesn't currently support a shared storage/index approach for serving queries (...beyond ad-hoc lookups against the document store, which is always shared). To add more colour to your use case, would I be right in assuming that most of that data is okay to be 'cold' most of the time? and are you suggesting a preference for low cost (but still scalable, ultimately) infra vs. low-latency queries?

John21:11:03

yes the working set would be relatively small - most data would just be history. that's part of what I don't grasp yet, how that would impact how much data is 'fetched' to be indexed. and I see, I didn't realize the index is not shared but it's something I wondered, how it was (if it were) coordinated 🙏

John21:11:05

in other words I'm wondering what is the impact for each client in a database that size, like, what would be in memory (if the index were kept in local mem)

John21:11:48

I don't state more constraints cause it's a hypothetical scenario. I just watched Rich's DB deconstructed talk today and played with xtdb but this part is still a mistery

👍 1
refset10:11:39

> how much data is 'fetched' to be indexed to confirm: ~all of it, once per node > what would be in memory (if the index were kept in local mem) the index really can't be purely in-memory for anything beyond testing. But if you use RocksDB then you can often avoid requiring a lot of memory for each node, as XT's current join algorithm is incremental (i.e. not needing to do large hash joins in-memory) and it can spill to disk (though it is still possible to OOM when querying something sufficiently complex)

🙇 1