This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2023-11-01
Channels
- # announcements (2)
- # babashka (93)
- # beginners (57)
- # biff (3)
- # cider (7)
- # clerk (5)
- # clj-kondo (9)
- # clojure (26)
- # clojure-austin (1)
- # clojure-bay-area (5)
- # clojure-europe (13)
- # clojure-norway (88)
- # clojure-uk (7)
- # clojurescript (3)
- # cursive (4)
- # datahike (2)
- # datalevin (10)
- # datomic (1)
- # events (4)
- # hyperfiddle (5)
- # jobs (3)
- # lsp (1)
- # malli (4)
- # missionary (3)
- # nrepl (1)
- # off-topic (45)
- # overtone (4)
- # pedestal (4)
- # polylith (13)
- # reitit (15)
- # releases (2)
- # shadow-cljs (30)
- # squint (1)
- # vim (1)
- # xtdb (6)
hey. I'm not grasping what exactly ends up in the index in the 'peer' nodes and how queries would work efficiently. let's say I have 100GB of bank statements among mere 100 customers with an even distribution of tx amount among them. that means a lot of indexes right ? I'm guessing the answer is yes, and the response is 'have the cache somewhere shared like memcached' and the conclusion is the total is just like a 'traditional' db but with this split with each part in separated boxes, and indexes being in N boxes
Hey @U04PF6VSNG0 so the short answer is that each node contains a complete replica of the index. XTDB doesn't currently support a shared storage/index approach for serving queries (...beyond ad-hoc lookups against the document store, which is always shared). To add more colour to your use case, would I be right in assuming that most of that data is okay to be 'cold' most of the time? and are you suggesting a preference for low cost (but still scalable, ultimately) infra vs. low-latency queries?
yes the working set would be relatively small - most data would just be history. that's part of what I don't grasp yet, how that would impact how much data is 'fetched' to be indexed. and I see, I didn't realize the index is not shared but it's something I wondered, how it was (if it were) coordinated 🙏
in other words I'm wondering what is the impact for each client in a database that size, like, what would be in memory (if the index were kept in local mem)
I don't state more constraints cause it's a hypothetical scenario. I just watched Rich's DB deconstructed talk today and played with xtdb but this part is still a mistery
> how much data is 'fetched' to be indexed to confirm: ~all of it, once per node > what would be in memory (if the index were kept in local mem) the index really can't be purely in-memory for anything beyond testing. But if you use RocksDB then you can often avoid requiring a lot of memory for each node, as XT's current join algorithm is incremental (i.e. not needing to do large hash joins in-memory) and it can spill to disk (though it is still possible to OOM when querying something sufficiently complex)