Fork me on GitHub
#xtdb
<
2023-08-30
>
ianjones21:08:13

are there any good posts on schema design for xtdb?

👍 6
2
ianjones21:08:09

im thinking about adding organizations, organization-users, accounts, user-accounts etc and am wondering what the pros/cons are of different approaches

ianjones21:08:01

i.e, would it be better to add a set on the organization thats a list of all the users apart of the org, or do a more “classic” relation of index records that record the org-id and user-id

refset09:08:48

Hey @U0CKDV6DB there's definitely been some good discussions on this channel before, but I've not yet seen anything written up a post anywhere with an opinionated set of advice backed up by real-world experiences (but that would be pretty great to see!) My own observations suggest that normalised ER modelling is still fundamentally a good idea in XT, and consequently that modelling relationships as their own entities is often appropriate despite the cost of some performance (extra joins can mean a lot more IO). There are no firm rules though. Are you already modelling the schema using Malli?

ianjones13:08:08

I’d love to see it as well! Yeah i was leaning towards normalized modeling. yeah i am modeling my schema in Malli

ianjones13:08:40

I’m using biff which has malli on by default. Kinda nice to know the data I invest matches my schemas

👍 2
Alexander Carls20:08:02

I just joined the channel and wanted to ask the same question 🙂! I toyed around with XTDB this evening, and one specific question also comes to mind (@U899JBRPF): A relation between entities, where the relation can have a start date in the future and an optional end. My gut feeling is, that this is a perfect match for valid-time . Because entities outside of the time range, are not used -- except when displaying it in the UI. The user must know what is up and coming. Looking at xt/entity-history I can show past records. For future ones, I need to reconstruct a new xt/db with a random valid-time far into the future, because xt/entity-history does not return valid-documents after the reference timestamp in xt/db as specified in the docs. Choosing a random date in the future for this use case feels a bit strange to me, I am wondering if you would model this differently or is this "common" to change xt/db per use case? Hopefully this fits this thread, otherwise I will move it. Thank you!

ianjones20:08:55

i’ve been wondering about that as well… kind of for a different situation. I would usually use a created-at field to order records but would it be recommended to use valid-time for this instead?

Alexander Carls21:08:07

Not sure if I understand the section correctly, I think this is nuanced and - partially - no longer true because you can include this "domain/valid-time" information via the history predicate: https://docs.xtdb.com/language-reference/datalog-queries/#_history_predicates probably your use case works with this? Mine doesn't; I have to think about this. :)

refset14:09:49

> Choosing a random date in the future for this use case feels a bit strange to me hey @U03CL5RA941 - welcome! with hindsight we probably wouldn't repeat the decision to couple the selected valid-time with the db API if we designed the 1.x API again, and not just for this reason (e.g. caching is tied up with it too) - but for now you should at least be able to reliably use Integer/MAX_VALUE (or (Date. Integer/MAX_VALUE)) instead of some other ~random value

refset14:09:06

> I think this is nuanced and - partially - no longer true because you can include this "domain/valid-time" information via the history predicate: https://docs.xtdb.com/language-reference/datalog-queries/#_history_predicates yep those predicates certainly do enable some new possibilities (and that other concept page on Bitemporality hasn't since been re-written with those in mind :thinking_face: 📝)

Samuel Ludwig20:09:23

Would you say that XTDBv2 would benefit from different schema-designs versus v1, or would a 'best-practice' schema in v1 also be a best-practice schema in v2?

seancorfield20:09:32

I've been wondering about modeling product subscriptions, where you buy something for, say, 30 days and you may renew or you may let it lapse. Would it make sense to add the subscription starting now (as valid-time) and then delete it as of now + 30 days, and each time it is renewed (manually), do a similar add and delete? That way, you could query the db "now" to see current revenues and also at potential future points to see how revenue might tail off if people don't renew subscriptions? (for a subscription that automatically renews, you wouldn't need the delete -- until the automatic renewal fails for the payment method)

jarohen21:09:26

@U0482NW9KL1 mostly, yep - but one big difference is the introduction of tables. particularly, this means that documents in different tables can have the same entity id, which allows you to split documents by domain but still join them back up by id when required. e.g. you might split up a document based on sensitivity - it may be easier to reason about access control data when more sensitive data about the same entity is in a different table. Documents in different tables also have different timelines - so if certain attributes about an entity change significantly faster than others, having them in different tables may both reduce unnecessary churn and provide more transparency about exactly what changed at each point

👀 1
gratitude 1
jarohen21:09:06

@U04V70XH6 yep - depending on how your specific subscriptions work, I can see the benefits of modelling them this way - you naturally get the subscription becoming invalid at its end date. a tradeoff here, though, is that the subscription essentially becomes invisible at this point, so if you do want to show the user a message like "your subscription expired last week", you can't use the simpler as-of-now queries - you have to query across valid-time. so maybe it might be easier in some cases to keep a separate, explicit "expires-at" column in the table, and then use the validity dimension to encode changes to the subscription (e.g. "actually, last Tuesday, the user told us they wanted to have the subscription expire at the end of the month"). tradeoff of this approach, of course, is that you're back to explicitly checking whether the subscription has now expired. tldr I don't know for sure, indeed there may not be a single Right Answer - time is fun 🙂

seancorfield21:09:17

Yeah, the two areas in our systems that I think would benefit from bitemp would be member profile data and interactions, and the financial stuff.

seancorfield21:09:44

I think you're right that we would still need an explicit expires-at -- we like to warn members ahead of time to turn on auto-renew or check their payment methods -- but we have two levels of billing: a billing agreement, which continues to exist all the time, once a member has made any purchase, and then individual subscriptions (tied to transactions -- a given subscription can have multiple transactions associated, and a billing agreement can have multiple subscriptions... hmm, I guess that's three levels!).

seancorfield21:09:12

Right now, it's super hard to answer certain questions around time-as-of (past history or future prediction) or "period of time" stuff. So we have to have several additional tables tracking a lot of stuff that is almost duplicated data 😞

seancorfield21:09:17

Hmm, that brings another Q to mind but I'll ask that in the main channel since it's not related to this directly.

simple_smile 2
👍 2
Samuel Ludwig20:09:23

Would you say that XTDBv2 would benefit from different schema-designs versus v1, or would a 'best-practice' schema in v1 also be a best-practice schema in v2?