This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2021-05-27
Channels
- # aws (7)
- # babashka (145)
- # beginners (83)
- # calva (18)
- # cider (11)
- # clara (9)
- # clj-kondo (59)
- # cljdoc (4)
- # cljs-dev (4)
- # cljsrn (11)
- # clojure (168)
- # clojure-australia (21)
- # clojure-dev (5)
- # clojure-europe (46)
- # clojure-italy (3)
- # clojure-nl (10)
- # clojure-taiwan (1)
- # clojure-uk (55)
- # clojurescript (85)
- # clojureverse-ops (1)
- # code-reviews (3)
- # conjure (22)
- # cursive (3)
- # datahike (3)
- # datomic (4)
- # emacs (5)
- # helix (20)
- # jackdaw (1)
- # jobs (2)
- # jobs-discuss (7)
- # lsp (1)
- # malli (5)
- # off-topic (85)
- # other-languages (4)
- # practicalli (4)
- # reitit (2)
- # releases (2)
- # sci (62)
- # shadow-cljs (181)
- # testing (5)
- # tools-deps (15)
- # xtdb (31)
I've got some code that calls submit-tx
, await-tx
, and then tx-committed?
. While importing data into a new app, I just got a NodeOutOfSyncException
. Is this a crux bug?
code: https://github.com/jacobobryant/biff/blob/master/libs/crux/src/biff/crux.clj#L389-L392
error:
May 27 19:57:14 findka-api run.sh[2007]: crux.api.NodeOutOfSyncException: Node out of sync - requested '{:crux.tx/tx-id 92, :crux.tx/tx-time #inst "2021-05-27T19:57:14.679-00:00"}', available '{:crux.tx/tx-time #inst "2021-05-27T19:57:13.513-00:00", :crux.tx/tx-id 91}'
May 27 19:57:14 findka-api run.sh[2007]: {:crux.error/error-type :node-out-of-sync, :crux.error/message "Node out of sync - requested '{:crux.tx/tx-id 92, :crux.tx/tx-time #inst \"2021-05-27T19:57:14.679-00:00\"}', available '{:crux.tx/tx-time #inst \"2021-05-27T19:57:13.513-00:00\", :crux.tx/tx-id 91}'", :requested {:crux.tx/tx-id 92, :crux.tx/tx-time #inst "2021-05-27T19:57:14.679-00:00"}, :available {:crux.tx/tx-time #inst "2021-05-27T19:57:13.513-00:00", :crux.tx/tx-id 91}}
May 27 19:57:14 findka-api run.sh[2007]: at crux.error$node_out_of_sync.invokeStatic (error.clj:19)
May 27 19:57:14 findka-api run.sh[2007]: crux.error$node_out_of_sync.invoke (error.clj:18)
May 27 19:57:14 findka-api run.sh[2007]: crux.node.CruxNode.tx_committed_QMARK_ (node.clj:135)
May 27 19:57:14 findka-api run.sh[2007]: biff.crux$submit_tx.invokeStatic (crux.clj:392)
May 27 19:57:14 findka-api run.sh[2007]: biff.crux$submit_tx.invoke (crux.clj:376)
> Is this a crux bug?
Hopefully not! It does seem weird though. Here are some questions:
1. Is your await-tx
synchronously blocking and returning before tx-committed?
is called?
2. Is there a timeout on the await-tx
?
3. Is it the same result of submit-tx
definitely being passed to both calls?
1. yes 2. no 3. yes
it works the vast majority of the time; I only got that exception once. perhaps there's a subtle race condition?
@U7YNGKDHA is that a local node?
(I'm guessing that explains it?)
alas, the other way around I'm afraid - we're aware of https://github.com/juxt/crux/issues/527 for remote nodes
oh, submit-tx
I'm assuming returns {:crux.tx/tx-id 92, :crux.tx/tx-time #inst "2021-05-27T19:57:14.679-00:00"}
also, does it remain out of sync? i.e. if you were to call await-tx/tx-committed again, would the Crux node have caught up?
I've since submitted more transactions and they worked fine
idk for sure if it would have caught up had I just called await-tx/tx-committed? without submitting another tx, but I could test that if the error comes up again
so my guess is race condition
actually would this be triggered if you submitted a tx that didn't actually change anything? i.e. put a bunch of documents that had already been putted.
When the exception was thrown, I was submitting documents from a query that was accidentally returning duplicates, and there's a decent chance the whole tx was just documents that I already imported
so there's a few events happening there, kv/store
which delegates through to the KV store's write-batch implementation, and only once that's returned do we notify the awaiting threads
I'm guessing we can't get hold of the result from await-tx
, given it's a very intermittent issue?
we're dealing with tx-ids rather than tx-times, and in any event the two txs (91 and 92) happened at distinct times - if they were the same time and you were calling await-tx-time
, that'd be a potential cause
interesting
I could try to reproduce the issue and print the result of await-tx
:thumbsup: I'll go ahead and do that right now, so maybe I'll have something in a few minutes
I think I figured it out https://github.com/juxt/crux/issues/1519#issuecomment-849958731
thanks @U7YNGKDHA, good spot 🙏
have a fix for this running through CI at the moment, and releasing a dev-SNAPSHOT
.
assuming you're on 1.16, dev-SNAPSHOT would be an index version bump because of other changes going into 1.17 shortly
thanks!