Fork me on GitHub
#xtdb
<
2020-02-13
>
Ivan Fedorov12:02:05

What’s the best way now to receive entities after a certain point in transaction time?

refset12:02:11

Hey, I've been looking at this with @UNDH5EQRL in the last few days. The likely solution is a crux-hook-tx module that provides the raw feed of transactions, so you would need to filter these to determine new/changed/deleted/evicted entities vs retroactive/proactive ops. We would publish it to crux-labs for the moment.

refset12:02:32

It would use crux.bus under the hood, something like this:

(ns crux.hook.tx
  (:require [crux.api :as api]
            [crux.bus :as bus]))

(defn event-stream [{:crux.node/keys [bus]} args]
  (bus/listen bus {:crux.bus/event-types #{:crux.tx/indexed-tx}}
              (fn [event] (tap> event))))

(def module {::event-stream {:start-fn #'crux.hook.tx/event-stream
                             :deps #{:crux.node/bus}}})

(def node (api/start-node {:crux.node/topology ['crux.standalone/topology
                                                'crux.hook.tx/module]
                           :crux.node/kv-store "crux.kv.memdb/kv"
                           :crux.kv/db-dir "data/db-dir-1"
                           :crux.standalone/event-log-dir "data/eventlog-1"
                           :crux.standalone/event-log-kv-store "crux.kv.memdb/kv"}))

refset12:02:36

I was planning to look into it again in the next couple of days and move it forward (it needs a way to register/unregister cb listeners, also the new event type needs merging into tx.clj)

Ivan Fedorov12:02:35

Sorry, I meant to ask if there is a thing like (api/db-after node my-valid-time my-tx-time) Like a reverse of api/db which gives the db as it was before a given moment in time

Ivan Fedorov12:02:25

But hooks are awesome news! sheepysheepysheepy

refset13:02:27

Ah okay, I was thrown off by the word "receive" I think 🙂 I'm not sure I understand how your db-after should work. It sounds like you want to invert the valid-time axis entirely so that your db value can only access things that are in the future (which could modelled as the past in a parallel crux instance where valid-time is inverted). Maybe an example would help?

refset13:02:49

Also, at the risk of being pedantic, I'm pretty certain db currently gives the db at a moment in time, not before :thinking_face:

Ivan Fedorov14:02:37

I meant to ask if there’s a way to see all facts occurred after a moment in valid time. Sure, I can do this on the application level. Asking in case if there is a new temporal index or something.

refset15:02:15

I can't think how to do anything like that with the current indexes. We don't have immediate plans for any new indexes although we do have an R&D track... Perhaps you can also maintain your own reified transaction index alongside your data with a transaction function

refset15:02:12

Again I think a worked example might help inspire solutions

jarohen18:02:43

@U0A5V8ZR6: you can already get the tx-log from a particular tx-id - would https://opencrux.com/docs#_tx_log work for your use case?

jarohen18:02:18

failing that, history-ascending on a DB value would get you all of the versions of a given entity from that DB forward (i.e. fixed tx-time, ascending valid-time)

Ivan Fedorov10:02:33

@U050V1N74 @taylor.jeremydavid thanks for the input! ❤️ I’ll stick to app level implementation then, as all I need is basically just a date comparison like

{:find [e]
 :where 
  [[e :create-time ct]
   [(> ct #inst "2020-02-10")]]}
I was asking in case if I missed any built-in vt coordinates or api/db alternatives

👍 4
jarohen10:02:17

np 🙂 fwiw, we're also considering how we can efficiently implement that kind of query - there are a few cards in our roadmap along these lines: https://github.com/juxt/crux/projects/8#column-7117153

dotemacs17:02:05

user> (crux/sync (crux-node) #inst "2020-02-13T17:19:58.276-00:00" nil)
Execution error (TimeoutException) at crux.tx/await-tx-time (tx.clj:445).
Timed out waiting for: #inst "2020-02-13T17:19:58.276-00:00" index has: {:crux.tx/tx-time #inst "2020-02-13T17:08:11.572-00:00", :crux.tx/tx-id 1619572420169732}
See the above, obviously the gap between the index and the #inst is ~11 minutes. My setup is Docker-ised Kafka running on “my machine” with 16GB of RAM where only ~8GB is used, Docker image used: wurstmeister/kafka:2.12-2.2.0. The number of documents that I tried to insert: 116282, in batches of 50. Now, I know that the documents will write eventually and the index will be synced at some point. But what kind of setups do you use to develop locally, so that I don’t have to sit around twiddling my thumbs and waiting for the sync to happen? Or do you just not bother with a local setup and do it all on a beefy server?

jarohen18:02:13

@dotemacs - is that tx-time taken from the result of a submit-tx ?

jarohen18:02:37

one potential gotcha is that if you try to sync to a tx-time that doesn't have a transaction, Crux waits for a transaction with that tx-time, which (most likely) never happens

jarohen18:02:09

hmm, first thought is I'm not sure that looks like a Kafka tx-id - Kafka uses incrementing integers, whereas that looks more like a recent timestamp - we have other tx-log implementations that do use timestamps, so it might be using one of those instead

jarohen18:02:39

if you're able, could you DM me (and @taylor.jeremydavid) your node configuration?

dotemacs18:02:39

OK, I’ll send it to you, but that tx-time is what’s returned by a tx-put…

👍 4
dotemacs18:02:41

My bad, the kafka is used to consume data off of it, but this was a Crux setup in standalone mode