Fork me on GitHub
#xtdb
<
2020-09-29
>
nivekuil05:09:21

So in crux semantics, a document is a set of facts that are all asserted together. On account of the operational implications, I'm tempted to model the world such that each entity/document is one fact, like {:crux.db/id :fact1 :user/id :user1 :user/first-name "foo"} {:crux.db/id :fact2 :user/id :user1 :user/last-name "bar". This is ergonomically unfortunate, but it seems to make sense to split up a logical entity across multiple crux documents especially if its attributes are changing at diffferent rates, so we're not asserting the same static facts over and over in the process of changing a few attributes. Still it seems fuzzy as to when this sort of semantic/operational tradeoff should be made.. any solid heuristics to consider here?

refset15:09:57

> "split up a logical entity across multiple crux documents especially if its attributes are changing at different rates" > I actually think this is good summary of the main heuristic. I would generally argue that unless your data volumes are significant (and therefore you have a lot of storage & bandwidth cost to manage) that you shouldn't worry too much about re-asserting the same facts over and over. Obviously there will be some inefficiency involved in duplication - with network usage and the document store in particular - but crucially the data in the RocksDB indexes is de-duplicated across versions (a very lightweight form of structural sharing). In theory a document store implementation could also implement structural sharing across versions.

nivekuil19:09:04

so here's my extrapolated conjecture: If it sometimes makes sense to split an entity across documents, then it will be helpful to have some machinery to piece that entity back together. If I have that machinery, then it is no longer painful to split entities across documents in general, so I may as well do it all the time, for every attribute.

nivekuil19:09:47

and in planning for changes in requirements, it might be prudent to have the application talk to an entity abstraction layer (that treats entity ids as canonical attributes, not :crux.db/ids) rather than crux/entity so if storage needs do go up in the future you only have to tweak the crux side of that abstraction layer and leave the application alone

nivekuil19:09:12

so my question really is: can I have my cake and eat it too, by building such an abstraction layer and therefore keep nice semantics without ever having to worry about their operational consequences?

refset21:09:13

It's an intriguing conjecture, and not terribly dissimilar to thoughts and discussions I've had before 🙂 The downside is the (likely) non-negligible cost for the query engine to process all the additional joins...but don't let me discourage you - more powerful semantics can often be worth the price!

nivekuil21:09:13

hm, so there's a tension here: 1. Attributes that are frequently queried together (and frequently queried in general) are best stored together 2. Attributes that are frequently changed separately (and frequently changed in general) are best stored separately regardless of how this tradeoff plays out in real needs, I think it's probably worth building abstractions for the application to talk to facilitate moving between these tradeoffs rather than going through the crux api directly

nivekuil21:09:55

that is to say, relegate that tradeoff to being a backend concern while maintaining consistent semantics for the application

refset22:09:50

That makes sense to me, yep. The tension is real. Just so I understand your position though, is it the storage cost or bandwidth/processing cost that most troubles you about the re-asserting of large documents? Or something else?

refset22:09:26

On the off-chance this is interesting to you, I did write some transaction functions that somewhat emulate the datoms model: https://gist.github.com/refset/a00be06443bc03ccc84a2874af3cdb8a (though given how transaction functions actually work this approach is definitely not a solution to worries about storage/bandwidth)

nivekuil22:09:50

storage cost, and not the $/byte, but the complexity of more moving parts being added to compensate for greedy software. That creates lots of scope for impact within a big company, and is avoided in small ones where outsourcing the ops stuff is eagerly done, but is unfortunate if you want to roll your own stuff. I understand my sorts of concerns are likely not a business priority for juxt :) But even if storage costs are not a foreseeable concern, it's still a good idea to design code such that it's resilient to unknown unknowns

refset17:09:31

Thanks for the context regarding storage - very helpful. > unfortunately the tx-fn itself is a tx which gets stored in the tx-log I don't think I see the problem here. Is it because you can't trust that the tx-fn is always put at the very start of the database? Or that it won't be overwritten later? In case it isn't clear, you can definitely run queries inside transaction functions, but perhaps I'm not sure what kind of uniqueness you're trying to model. I like that gist a lot 🙂

nivekuil17:09:47

it's unfortunate in the sense that I would be using up resources (storing the failed tx) to save resources :)

refset17:09:21

aha, yes of course...consistency isn't free!

jarohen14:09:52

Afternoon all - 1.12.1's out 🚀 for a list of the bugs squashed in this release, head on over to the release notes: https://github.com/juxt/crux/releases/tag/20.09-1.12.1 As always - any issues, let us know! 🙂

18
🐛 6
nivekuil22:09:47

yeah I was playing with tx-fns, but unfortunately the tx-fn itself is a tx which gets stored in the tx-log, so e.g. "put only if unique" needs to be done with a query in application logic, otherwise it'd be self-defeating

nivekuil08:09:55

here's a ￱~￱10 minute mockup that I'll be playing around with, for your curiousity: https://gist.github.com/nivekuil/bc8d8d896f0db23a4d015946100247ca

nivekuil08:09:50

so I think that if I decided to add a :user/favorite-person attribute, that would be totally transparent to the application, even in terms of performance? Then if it turns out that :user/favorite-person has a different usage pattern than the rest of the favorites, if the application is querying for the favorites directly with a crux/q then I could split favorite-person out into its own document without changing a line of application code

nivekuil08:09:01

another interesting thing this might facilitate is incremental schematization, so I could spec out a well-known set of facts representing a subset of the entity, without constraining what else it can be

refset17:09:22

It's a very neat design! > incremental schematization 👌👌 (for the benefit of others - the context is the gist from the previous thread https://clojurians.slack.com/archives/CG3AM2F7V/p1601356041058300)

nivekuil18:09:31

oh, I didn't realize this got branched off into its own thread.. perils of using matrix.

🙂 3
Steven Deobald22:10:50

@U797MAJ8M OT: You're bridging Matrix to Slack?

nivekuil22:10:16

@U01AVNG2XNF: yup, using https://github.com/Sorunome/mx-puppet-slack . beware though, since matrix/element doesn't support threading it's just a linear chain of replies and totally unusable on mobile. Still prefer it over the slack client separately

Steven Deobald22:10:56

Pretty cool! I don't use Slack on mobile anyway so I wouldn't be losing anything if I muster the energy to try this out. 🙂

nivekuil22:10:17

@U01AVNG2XNF: yeah it's nice, I have like 5 bridges running and wouldn't go back