xtdb

Jeremy 2024-11-21T12:45:07.720399Z

Hi everyone, Newbie here, so pardon If I have some misconceptions. I've been sitting on trying out xtdb for a project of mine, but just can't seem to get a handle of if it's suitable. Among other things, I'd like to use xtdb for "market recording". Each recording consists of 10 - 40 entities, and each entity is a sorted map that changes over time (the market also has it's own changing info, but this can be treated as another entity). Each market recording ends up having about 10-20k samples per entity. Only about 1k recording are retained at any point in time (to save disk space). My current approach uses a map for the market, and vectors of maps as the entity timeline. I persist with nippy + market id as file name. Each file ends up being 2-13MB. However, I can't randomly sample data from different markets quickly because It takes fews seconds to load, plus all data is loaded at once even if I just want 1 snapshot, which has restricted my workflow to handling a few markets at a time. Storing these in xtdb would make things a lot easier, but since there's no structural sharing, I'm unsure if it's suitable. Is there a guideline for a similar problem? I'm currently thinking of having separate document for each entity to reduce the copying done, but still unsure if this is the right approach.

refset 2024-11-21T13:15:36.547019Z

Hey (fellow) @olajeremy123! It sounds like a good match for XTDB in theory, regardless of concerns about structural sharing - you may be interested in my answer https://discuss.xtdb.com/t/v2-best-way-to-handle-frequent-updates-that-might-not-contain-any-changes/404/13?u=refset. How much data in total is being handled in the system assuming you didn't drop >1k samples, e.g. over a year?

👍 1
refset 2024-11-21T13:16:37.966859Z

how large is each entity? is it well-modelled? or do you have to capture lots of nested details that change often?

Jeremy 2024-11-21T14:02:15.481019Z

In a year, there'll be about 6000 market recordings. However, I'm working limited disk space, so I prefer to prune history at some point. I look to remodel my entity as I make switch to xtdb (It's my first main clj project, so lots of bad decisions). Currently, an entity snapshot is a map of sorted maps. i.e. a map of 3-5 keys, and each key mapsto a sorted <double, double>map of a maximum of 350 entries (in practice, only about 10-20 entries are used at any given moment). the values in these sorted maps are what changes.

Jeremy 2024-11-21T14:03:23.259339Z

Reading the linked thread, It looks like the consensus is to just do it; split docs into fast and slow parts, and hope xtdb compression does the rest

✅ 1
refset 2024-11-21T14:08:17.695859Z

Thanks for the added context. ow come the disk space is limited? is this running on some fixed compute infra? or on end user machines?

Jeremy 2024-11-21T14:17:37.956319Z

Yes, It's a hobby project, so I'm running everything on my local machine with 512gb internal ssd; I basically index market data, do some analysis, feature engineering, etc.

refset 2024-11-21T15:01:21.241199Z

Ah okay, I guess that changes the equation a bit - it may be cheaper (in time/effort) to invest in an external ssd 🙂

refset 2024-11-21T15:05:50.323119Z

Be aware that so far v2 hasn't had advanced retention / downsampling ideas prioritised, and ERASE is quite a blunt instrument. Therefore, you can't use v2 currently and expect to reclaim space or keep storage limited. You would need to lean on the idea of 'time-sharding'/decanting into fresh database instances regularly (e.g. once per month) and only copy over the data you want to retain

Jeremy 2024-11-21T15:42:37.758319Z

Yh, I don't mind doing that once or twice, and I'd likely move onto the cloud in 2-3 months. Thanks @taylor.jeremydavid

👌 1
🙂 1
Panel 2024-11-21T13:36:17.613639Z

Do columns get cleaned up ? If I insert a bunch of records that later on get erased the columns still exist and return nil when SELECT *. It's just out of curiosity, this is mostly something that shows up at the repl when trying out, unlikely to be an issue.

jarohen 2024-11-21T13:41:25.775779Z

no, unfortunately we can't feasibly clear columns up without implementing some form of drop-column DDL; without this I think we'd need to check whether to remove a column by checking every other doc every time a document gets erased

jarohen 2024-11-21T13:42:05.470269Z

the drop-column DDL is also problematic for a database that prides itself on not being able to destructively edit the past 🙂 maybe we'd need to implement that with erase semantics too

Panel 2024-11-21T13:44:28.039439Z

Make sense, thanks.