xtdb 2021-05-19 | Slack Archive

Hukka03:05:04

Oh wow, I had already read https://opencrux.com/blog/crux-strength-of-the-record.html but back then Crux itself didn't leave any mark in my memory. Don't remember how I ended up reading it. Maybe via other clojure themed posts, or maybe reading about mongo? What a difference two months make. I didn't notice the references to datomic on the first go at all. Still not sold on inpracticality of triplets though, not having actually used them 😉 Guess I have to try them all for real, now that learning fulcro sent me into the datalog rabbit hole.

refset07:05:02

Aha, cool! It would be fascinating to hear your conclusions either way

Hukka08:05:12

So far it's a bit tricky to gauge Crux just by reading. Documentation is still sparse, but I guess that's why the chat channels exist. For example I haven't really seen an answer on how does Crux handle changing one field in a document with lot's of fields, does it make a copy of all the others? In Datomic it's pretty clear what happens; that only that the DB grows only by the retraction of the old fact and addition of the new one. This would then affect if something should be modeled as modification of the document, or another document that refers the bigger, more stable data in another document

refset08:05:42

> does it make a copy of all the others? In the document store, yes, but in the index store, no (there's structural sharing) Is disk usage / cost your main concern with this model?

Hukka08:05:17

Not main, but I'd like to have a rough grasp on how systems that are not SQL servers would behave, even before bringing these options to the table

👍 4

Hukka08:05:23

Even if it's not a main concern, with a copy in the document store, I guess it wouldn't make sense to have quickly changing fields in documents with lot's of fields, but have more granular separation of concerns. While in Postgres and DAtomic that wouldn't matter, the cost is the same whether it's one big or many small entities

refset09:05:08

The "quickly changing" aspect is partly why Crux is document-oriented in the first place, because durable writes don't first require accessing the indexes (unlike triple stores, which need to interpret the deltas for maintaining consistency). But yes, generally splitting entities into components with different rates of change is probably a good idea regardless

Hukka09:05:29

So it's a tradeoff of better write speed for worse write amplification in storage?

refset09:05:17

Yep I think that's a valid way to think about it - Crux was initially needed for handling a large write throughput use-case we had. That said, the other dimensions motivating the document model are arguably more profound for general usage (as per the Strength of the Record post)

Hukka09:05:14

Eagerly waiting for those benchmarks 😉

🦾 4

refset09:05:18

I've DM'd you on Zulip 😉

nivekuil14:05:51

splitting information across documents also affects query speed to stitch information together in case you need the whole "row", though I think this is usually fine as you can scale queries elastically while storage space is lost forever (similar rationale to e.g. grafana loki)

👍 4

nivekuil14:05:54

and splitting into smaller docs can also help cache efficiency depending on your use patterns, though I think given the pending patch documents shouldn't be cached at all locally? it would still help the doc store itself to cache stuff ofc

refset15:05:16

> given the pending patch documents shouldn't be cached at all locally? I'll wait until the patch is finished before trying to figure out the answer to that 😄

seeday15:05:02

Is there a way to control txtime in crux? Probably not because the txlog is supposed to be always increasing right? Attempting to import from another system that already has the equivalent of tx-time and valid-time timestamps

jarohen15:05:26

not via the main APIs - as you say, the tx-log is assumed to be the arbiter of tx-time/tx-id that said, the API that tx-logs call to ingest transactions into Crux does take in the tx-time as a parameter - depending on what this external system is, do you think it could be a new TxLog implementation?

jarohen15:05:27

or are you trying to replace this existing system with a Crux TxLog?

seeday15:05:37

Replacing an external system with crux, but keeping the tx timestamps correct

jarohen15:05:55

which Crux TxLog were you looking to use?

seeday15:05:01

Likely jdbc? I think that would be easiest to trick into modifying the tx timestamps as well. Could even do out of order import

jarohen15:05:40

I suspect out-of-order import would cause issues with how Crux indexes transactions, but maybe there's something we can do around allowing users to specify tx-times so long as there's no transaction more recent already submitted

seeday16:05:31

This is a one time import to a fresh system so I’m happy with reindexing, but that’s definitely a concern for a live system. I know datomic allows that (control over the tx-time but has to be sequential) as well with a manual :db/txInstant

seeday16:05:08

How would that work with tx-logs like kafka where it uses the kafka cluster’s time as the tx-time?

jarohen16:05:06

I'd guess we could use Kafka's time as a default in that case, if it hadn't otherwise been provided

jarohen16:05:42

raised https://github.com/juxt/crux/issues/1517 to track this one 🙂

seeday16:05:42

I’ll definitely take a stab at implementing it, it seems like a pretty simple thing to tackle code-wise. Just some interesting decisions to be made around edge cases.

seeday16:05:29

I’ll take further design questions to the GitHub issue

2021-05-19

Channels