Fork me on GitHub
#xtdb
<
2021-06-23
>
henrik06:06:01

If I'd like to inspect the changes made (sorry, "the facts asserted") at a specific epochal time, how would I go about it? Edit: sort of related to the above I guess, but I'd like to slice very precisely on a transaction rather than on timestamps. And not on a specific entity, since that's what I would like to know. The use case is that I'd like to use them to prepare a response to the initiator of the transaction.

henrik14:06:00

This is the answer I ended up with:

(defn get-tx
    [env epoch]
    (with-open [res (c/open-tx-log (crux/node env) (dec epoch) false)]
      (first (iterator-seq (:lazy-seq-iterator res)))))
`

henrik14:06:22

(Thanks @U899JBRPF for taking a look at it!)

refset14:06:05

šŸ™‚ np, I think you should be able to skip the (:lazy-seq-iterator ...) step here though

bocaj19:06:31

Iā€™m using match with nil and then put to check if a document exists, saving if it doesnā€™t. 1.) It seems the sqlite database is storing the full document in the transaction perhaps? How could I keep the db small on disk? 2.) Can I use a vec of [:crux.tx/match ā€¦ ][:crux.tx/put ā€¦] [:crux.tx/match ā€¦] [:crux.tx/put ā€¦] instead of multiple calls to submit-tx

bocaj19:06:07

(with-open [in-file (io/reader "UG.txt")]
  (->>
    (csv/parse-csv in-file :delimiter \tab)
    (take 6000)
    (sc/remove-comments)
    (sc/mappify)
    (map #(crux/submit-tx node [[:crux.tx/match
                                 (:PID %)
                                 nil
                                 nil]
                                [:crux.tx/put
                                 (assoc %
                                   :crux.db/id (:PID %))
                                 ]]))
    doall))
Hereā€™s how Iā€™m doing it now

refset20:06:07

Any given full document should only be stored in Sqlite once, however if you make even small changes to some value then the whole new version of the document must be stored again > Can I use a vec of > [:crux.tx/match ā€¦ ][:crux.tx/put ā€¦] [:crux.tx/match ā€¦] [:crux.tx/put ā€¦]Ā instead of multiple calls toĀ `submit-tx` If any single match fails then the entire transaction will fail, so the semantics are fundamentally different if you batch like this. If that's okay for you though, then it's certainly a more efficient way to process the data (i.e. in larger batches)

bocaj20:06:39

Iā€™m seeing storage size consistent with only storing doc one, unless changes. Thanks for clarifying whole transaction fails, thatā€™s not what Iā€™m going for.

refset20:06:44

> Iā€™m seeing storage size consistent with only storing doc one, unless changes. That's good to hear - I was just reading through the code to double-check my assertion was correct! Glad I could help šŸ™‚

bocaj00:06:06

Using the Cursor on the log, Iā€™m getting fewer results than expected. After restarting the system, I get more results. It feels like Iā€™m 1) holding onto too much state which has led outofmemory err, and 2) I should be blocking somewhere to wait for the transaction log to be built up. Iā€™m hoping to quickly push in 100,000 docs using match, and then read back whichever are shown to be new.

(defn rows-since [since]
  (with-open [log (crux/open-tx-log node 0 true)]
    (-> (for [{:keys                                           [crux.tx/tx-time]
               [[crux-op1 crux-id crux-id-nil] [crux-op2 doc]] :crux.api/tx-ops :as tx} (iterator-seq log)
              :when (is-on-or-after tx-time since)]
          (into {} (concat doc (first (crux/entity-history (crux/db node tx) crux-id :asc))))
          )
        doall
        )))

refset11:06:08

> Using the Cursor on the log, Iā€™m getting fewer results than expected. Are you using await-tx to ensure that the ingestion process has finished?

refset11:06:40

Your rows-since does indeed look like the working set might be huge. Are you able to batch it up with some writes to disk?

refset11:06:11

Also, rather than scanning from the beginning of the log each time, I suppose you could either do a binary search, or just store some state about the approx. last seen tx-id (and it's tx-time) when you last ran the export job

bocaj15:06:35

Iā€™ve started working with await-tx. I am assuming performance will go down a bit, but I donā€™t see a way to block on a group of transactions using cruxā€¦would be a program level thing.

bocaj15:06:34

I plan on storing when we looked at the log last, and querying for tx ā€œsinceā€ or ā€œafterā€ an inst. thanks for all the help

bocaj16:06:13

oops, doall keep the head of the list, dorun or another side-effect should work out

šŸ™‚ 2
refset22:06:18

Riiight, cool, I completely missed that too šŸ˜…

refset16:06:24

Unrelated, but I just saw your post on #asami...you may want to look at this example graph algorithm ns we adapted from a blog post a while back https://github.com/juxt/crux/blob/master/docs/example/imdb/src/imdb/main.clj#L106

refset16:06:03

I'm not certain anyone's tried to run it recently, but I can check it works still, if you're interested