This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2021-05-19
Channels
- # aleph (11)
- # announcements (1)
- # babashka (9)
- # beginners (90)
- # calva (6)
- # clj-kondo (24)
- # cljs-dev (26)
- # clojure (92)
- # clojure-europe (48)
- # clojure-nl (1)
- # clojure-spec (4)
- # clojure-sweden (2)
- # clojure-uk (41)
- # clojurescript (60)
- # code-reviews (1)
- # conjure (6)
- # core-logic (3)
- # datascript (1)
- # datomic (3)
- # deps-new (1)
- # depstar (4)
- # dirac (3)
- # emacs (8)
- # fulcro (1)
- # helix (27)
- # introduce-yourself (2)
- # jobs (1)
- # off-topic (4)
- # pathom (2)
- # polylith (8)
- # re-frame (3)
- # remote-jobs (1)
- # shadow-cljs (5)
- # spacemacs (2)
- # tools-deps (22)
- # vim (3)
- # xtdb (27)
Oh wow, I had already read https://opencrux.com/blog/crux-strength-of-the-record.html but back then Crux itself didn't leave any mark in my memory. Don't remember how I ended up reading it. Maybe via other clojure themed posts, or maybe reading about mongo? What a difference two months make. I didn't notice the references to datomic on the first go at all. Still not sold on inpracticality of triplets though, not having actually used them 😉 Guess I have to try them all for real, now that learning fulcro sent me into the datalog rabbit hole.
So far it's a bit tricky to gauge Crux just by reading. Documentation is still sparse, but I guess that's why the chat channels exist. For example I haven't really seen an answer on how does Crux handle changing one field in a document with lot's of fields, does it make a copy of all the others? In Datomic it's pretty clear what happens; that only that the DB grows only by the retraction of the old fact and addition of the new one. This would then affect if something should be modeled as modification of the document, or another document that refers the bigger, more stable data in another document
> does it make a copy of all the others? In the document store, yes, but in the index store, no (there's structural sharing) Is disk usage / cost your main concern with this model?
Not main, but I'd like to have a rough grasp on how systems that are not SQL servers would behave, even before bringing these options to the table
Even if it's not a main concern, with a copy in the document store, I guess it wouldn't make sense to have quickly changing fields in documents with lot's of fields, but have more granular separation of concerns. While in Postgres and DAtomic that wouldn't matter, the cost is the same whether it's one big or many small entities
The "quickly changing" aspect is partly why Crux is document-oriented in the first place, because durable writes don't first require accessing the indexes (unlike triple stores, which need to interpret the deltas for maintaining consistency). But yes, generally splitting entities into components with different rates of change is probably a good idea regardless
Yep I think that's a valid way to think about it - Crux was initially needed for handling a large write throughput use-case we had. That said, the other dimensions motivating the document model are arguably more profound for general usage (as per the Strength of the Record post)
splitting information across documents also affects query speed to stitch information together in case you need the whole "row", though I think this is usually fine as you can scale queries elastically while storage space is lost forever (similar rationale to e.g. grafana loki)
and splitting into smaller docs can also help cache efficiency depending on your use patterns, though I think given the pending patch documents shouldn't be cached at all locally? it would still help the doc store itself to cache stuff ofc
> given the pending patch documents shouldn't be cached at all locally? I'll wait until the patch is finished before trying to figure out the answer to that 😄
Is there a way to control txtime in crux? Probably not because the txlog is supposed to be always increasing right? Attempting to import from another system that already has the equivalent of tx-time and valid-time timestamps
not via the main APIs - as you say, the tx-log is assumed to be the arbiter of tx-time/tx-id that said, the API that tx-logs call to ingest transactions into Crux does take in the tx-time as a parameter - depending on what this external system is, do you think it could be a new TxLog implementation?
Likely jdbc? I think that would be easiest to trick into modifying the tx timestamps as well. Could even do out of order import
I suspect out-of-order import would cause issues with how Crux indexes transactions, but maybe there's something we can do around allowing users to specify tx-times so long as there's no transaction more recent already submitted
This is a one time import to a fresh system so I’m happy with reindexing, but that’s definitely a concern for a live system. I know datomic allows that (control over the tx-time but has to be sequential) as well with a manual :db/txInstant
How would that work with tx-logs like kafka where it uses the kafka cluster’s time as the tx-time?
I'd guess we could use Kafka's time as a default in that case, if it hadn't otherwise been provided
raised https://github.com/juxt/crux/issues/1517 to track this one 🙂