Fork me on GitHub
#xtdb
<
2021-02-12
>
nivekuil15:02:17

btw, scylla is particularly of interest to me because of the recent cdc feature: https://docs.scylladb.com/using-scylla/cdc/cdc-intro/ I wonder if this could be leveraged to efficiently patch documents somehow, among other interesting things you could do (structured doc store instead of blobs?) the ease of operation is also nice of course

👀 3
refset15:02:42

Thanks for the context, this is pretty interesting! A patch-structured document store backend would be most welcome ☺️

nivekuil15:02:28

or even somehow inferring txs from the doc store rather than the application handling both with https://github.com/scylladb/scylla-cdc-java/tree/master/scylla-cdc-kafka-connect

refset15:02:46

To reliably infer Crux transactions I think you either need to be able to derive from an existing log or record a new log - a CDC flow into Kafka would do the job :thumbsup: assuming that was the only source of transactions you could then use the (internal) transaction API to save having to write Crux transactions back out to a new Kafka log

nivekuil15:02:04

I wonder if you could then have a single API that selectively pipes into crux, so you can separate your business-critical, bitemporally valuable stuff from high-volume application state changes

nivekuil15:02:35

definitely would add a lot of complexity

nivekuil15:02:55

or even multiple crux nodes with e.g. different lucene indexing strategies, depending on the document structure

nivekuil16:02:57

maybe multitenancy as well, so crux.tx/put {:client/id 1 ...} gets sent to the appropriate tx log transparently

gklijs16:02:31

That's kind of what I'm thinking of trying out. Just have separate topics for different kind of events and use them to build up crux. And only use the resulting crux for queries.

nivekuil16:02:43

and in clojure we already have pathom for a unified query api that glues together different data sources

nivekuil16:02:14

I think you'd have to be really careful if you needed temporal consistency across all of them though

nivekuil18:02:46

may be more of a clojure question than a crux question, but what's with this behavior:

(put {:crux.db/id {:test/id 1} :ref 2})   (put {:crux.db/id {:test/id 2} :ref 3})    ;; this works, returning #{[3]} (q '{:find  [?e]        :where [[{:test/id 1} :ref ?x]                [(array-map :test/id ?x) ?x2]                [?x2 :ref ?e]]})  ;; this doesn't, returning emptyset (q '{:find  [?e]        :where [[{:test/id 1} :ref ?x]                [(identity {:test/id ?x}) ?x2]                [?x2 :ref ?e]]})  ;; this is true (= (identity {:a 1}) (array-map :a 1))

refset18:02:08

it's a Crux Datalog behaviour - you can't use Clojure collection literals as arguments to predicates (except set literals, which act as relations, not Clojure sets). So if you want to use a Clojure map as a predicate argument you have to feed it in via another clause that binds the output of array-map to an intermediate logic variable...exactly like you've got in your first example 🙂

nivekuil19:02:55

ah, thanks for the insight! technically if I wanted to make this go faster I think I could unroll array-map but it's already pretty fast at 150ns

😄 3
nivekuil18:02:29

the reason I'm doing this is exploring further saving space by encoding the logical id attr :test/id in the crux.db/id, and having the mapping be well-known in code instead of encoded in the docs themselves. so far seems promising enough, the extra triple for array-map only costs like a ￱~￱10% overhead on a simple query (90->100us)