This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2021-06-23
Channels
- # announcements (1)
- # asami (2)
- # aws (15)
- # babashka (4)
- # beginners (42)
- # calva (8)
- # clj-kondo (7)
- # cljsrn (31)
- # clojure (64)
- # clojure-australia (4)
- # clojure-europe (40)
- # clojure-italy (2)
- # clojure-nl (5)
- # clojure-uk (10)
- # clojured (1)
- # clojurescript (16)
- # conjure (4)
- # datomic (5)
- # defnpodcast (2)
- # events (1)
- # fulcro (61)
- # graphql (11)
- # honeysql (9)
- # jobs (3)
- # jobs-discuss (3)
- # lsp (65)
- # malli (3)
- # meander (4)
- # off-topic (5)
- # pathom (32)
- # podcasts-discuss (2)
- # polylith (2)
- # re-frame (30)
- # reitit (6)
- # remote-jobs (3)
- # ring (4)
- # shadow-cljs (19)
- # sql (28)
- # vim (1)
- # xtdb (21)
If I'd like to inspect the changes made (sorry, "the facts asserted") at a specific epochal time, how would I go about it? Edit: sort of related to the above I guess, but I'd like to slice very precisely on a transaction rather than on timestamps. And not on a specific entity, since that's what I would like to know. The use case is that I'd like to use them to prepare a response to the initiator of the transaction.
This is the answer I ended up with:
(defn get-tx
[env epoch]
(with-open [res (c/open-tx-log (crux/node env) (dec epoch) false)]
(first (iterator-seq (:lazy-seq-iterator res)))))
`(Thanks @U899JBRPF for taking a look at it!)
š np, I think you should be able to skip the (:lazy-seq-iterator ...)
step here though
Iām using match with nil and then put to check if a document exists, saving if it doesnāt.
1.) It seems the sqlite database is storing the full document in the transaction perhaps? How could I keep the db small on disk?
2.) Can I use a vec of
[:crux.tx/match ⦠][:crux.tx/put ā¦] [:crux.tx/match ā¦] [:crux.tx/put ā¦]
instead of multiple calls to submit-tx
(with-open [in-file (io/reader "UG.txt")]
(->>
(csv/parse-csv in-file :delimiter \tab)
(take 6000)
(sc/remove-comments)
(sc/mappify)
(map #(crux/submit-tx node [[:crux.tx/match
(:PID %)
nil
nil]
[:crux.tx/put
(assoc %
:crux.db/id (:PID %))
]]))
doall))
Hereās how Iām doing it nowAny given full document should only be stored in Sqlite once, however if you make even small changes to some value then the whole new version of the document must be stored again
> Can I use a vec of
> [:crux.tx/match ⦠][:crux.tx/put ā¦] [:crux.tx/match ā¦] [:crux.tx/put ā¦]
Ā instead of multiple calls toĀ `submit-tx`
If any single match fails then the entire transaction will fail, so the semantics are fundamentally different if you batch like this. If that's okay for you though, then it's certainly a more efficient way to process the data (i.e. in larger batches)
Iām seeing storage size consistent with only storing doc one, unless changes. Thanks for clarifying whole transaction fails, thatās not what Iām going for.
> Iām seeing storage size consistent with only storing doc one, unless changes. That's good to hear - I was just reading through the code to double-check my assertion was correct! Glad I could help š
Using the Cursor on the log, Iām getting fewer results than expected. After restarting the system, I get more results. It feels like Iām 1) holding onto too much state which has led outofmemory err, and 2) I should be blocking somewhere to wait for the transaction log to be built up. Iām hoping to quickly push in 100,000 docs using match, and then read back whichever are shown to be new.
(defn rows-since [since]
(with-open [log (crux/open-tx-log node 0 true)]
(-> (for [{:keys [crux.tx/tx-time]
[[crux-op1 crux-id crux-id-nil] [crux-op2 doc]] :crux.api/tx-ops :as tx} (iterator-seq log)
:when (is-on-or-after tx-time since)]
(into {} (concat doc (first (crux/entity-history (crux/db node tx) crux-id :asc))))
)
doall
)))
> Using the Cursor on the log, Iām getting fewer results than expected.
Are you using await-tx
to ensure that the ingestion process has finished?
Your rows-since
does indeed look like the working set might be huge. Are you able to batch it up with some writes to disk?
Also, rather than scanning from the beginning of the log each time, I suppose you could either do a binary search, or just store some state about the approx. last seen tx-id (and it's tx-time) when you last ran the export job
Iāve started working with await-tx. I am assuming performance will go down a bit, but I donāt see a way to block on a group of transactions using cruxā¦would be a program level thing.
I plan on storing when we looked at the log last, and querying for tx āsinceā or āafterā an inst. thanks for all the help
oops, doall
keep the head of the list, dorun
or another side-effect should work out
Unrelated, but I just saw your post on #asami...you may want to look at this example graph algorithm ns we adapted from a blog post a while back https://github.com/juxt/crux/blob/master/docs/example/imdb/src/imdb/main.clj#L106