xtdb 2020-09-12 | Slack Archive

udit13:09:54

Hello good folks. So I was going through the SpaceTime adventure to get started with Crux. The tutorial starts off with an ephemeral standalone node, with in memory kv store. I wanted to experiment with other configurations. Is there an exhaustive list of the configuration parameters that I can lookup?

refset13:09:24

Hey! So it's a slightly awkward moment to ask as we are releasing a whole new module system this coming week 🙂 In the new world, this list of configs in crux-bench is fairly comprehensive (though still not exhaustive): https://github.com/juxt/crux/blob/master/crux-bench/src/crux/bench.clj#L211 In the current world it looks like on the LHS: https://github.com/juxt/crux/commit/9002699e531215e118d72221ad0755e9585107fa#diff-b19d1d23008e62840257de9fee6b671cL206

refset13:09:52

Naturally there are lots of trade-offs to explore with combination, so feel free to ask about whatever it is you're most interested in (performance, cost, scaling etc.)

udit13:09:15

Thanks @U899JBRPF This greatly helps. I had found this: https://github.com/juxt/crux/blob/d713fd5af627dd5eb23278456849e22005b9b444/docs/reference/modules/ROOT/examples/src/docs/examples.clj But I was slightly unsure if the vector instead of maps for topology is a valid schema. I will disregard this.

🙏 3

nivekuil14:09:53

is the entity-history api suitable for tracking events, e.g. link clicks? say with each event being its own document, with an id of {:user/id :event/type}

nivekuil14:09:21

that is, is it suitable for use in application queries, or is it more of an auditing tool? a better example might be like modeling a chat log with the history of a single document

nivekuil14:09:56

in fact a group chat might even just be modeled as {:crux.db/id :channel-foo :message "foo" :sender :foo}. is it really as simple as it sounds, or are there some caveats here?

refset14:09:50

It really depends on the complexity of the application queries and kinds of joins (if any) that you might need in future. So it's not necessarily a terrible idea if you are confident about your requirements early on, but it's not the first place I'd start when modelling a domain

nivekuil14:09:22

for example, if you decide to add reactions to messages in the future, would it be tough to represent a reference to an individual message with that model?

nivekuil14:09:03

could you just query with a valid time to get message/event level granularity?

refset14:09:09

Exactly, you can't trivially (natively) reference an old version of an entity within a query

refset14:09:46

If you have a valid time on hand, and you know what you're looking for, then Crux is easy, but where the valid-time-as-domain-model mapping falls down is when you know what you're looking for but have no idea what valid time(s) to be looking at/between. We don't currently provide indexes to make those kinds of lookups straightforward and hence why we don't expose valid time to the Datalog queries themselves

nivekuil14:09:21

I think for a chat room you'd want to pass the time to the client anyway, and so the message-level granularity would be pretty convenient.. but what are the operational characteristics? is it much more expensive to ask crux those kinds of questions vs representing each message as its own doc?

refset14:09:03

scanning entity-history and scanning AVE indexes should be near enough the same performance

refset14:09:57

there is a small space overhead with storing additional docs as separate entities vs storing lots of revisions, but nothing that ought to impact your decision

nivekuil14:09:17

what about something like locality? are separate docs less cache-friendly? I suppose on disk it heavily depends on the storage backend; are they represented the same in memory?

refset14:09:38

interesting question! so entity-history probably pays a bit a of tax in that it operates with a document cache in order (in addition to a couple of KV history indexes), whereas the AVE index (+ friends) is more granular and ultimately constitutes what we consider the hot path in Crux - where most of the performance engineering effort goes! There might be a more concrete answer about exactly what's better/faster in terms of memory usage, like I know AVE (+ friends) works with off-heap buffers for the most part, whereas entity-history will have to bring a lot of things on-heap if you're doing scanning in user space. But I'm fuzzy on the practical implications of all that. Maybe someone else on the team can provide more opinions. As ever, the best thing to do is write a test and benchmark your use-case, rather than purely take anyone's word for it, though I'm happy to try and elaborate further 🙂

nivekuil15:09:44

thanks, that's helpful :) It answers my question fully that the entity-history api isn't really intended for the hot path of application queries, so that seems to me a compelling reason to not design application logic around it

👍 3

2020-09-12

Channels