This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2020-12-04
Channels
- # adventofcode (161)
- # asami (2)
- # babashka (56)
- # beginners (128)
- # calva (57)
- # cider (10)
- # circleci (1)
- # clj-kondo (4)
- # clojure (13)
- # clojure-europe (44)
- # clojure-france (32)
- # clojure-italy (3)
- # clojure-nl (18)
- # clojure-spec (7)
- # clojure-uk (26)
- # clojurescript (18)
- # code-reviews (15)
- # community-development (7)
- # conjure (5)
- # cryogen (8)
- # cursive (31)
- # datomic (18)
- # emacs (8)
- # events (4)
- # figwheel-main (7)
- # fulcro (42)
- # juxt (3)
- # kaocha (58)
- # lambdaisland (1)
- # malli (1)
- # minimallist (1)
- # pathom (11)
- # pedestal (9)
- # re-frame (28)
- # reagent (20)
- # reclojure (4)
- # releases (1)
- # reveal (23)
- # schema (2)
- # shadow-cljs (7)
- # test-check (67)
- # xtdb (23)
Not sure that will be the thing I use in the tutorial; at BOB Konferenz. But definitely event sourcing with Clojure. https://bobkonf.de/2021/en/
I've envisioned using Crux to store event logs (as document histories) and then using (mini) in-memory Crux nodes or DataScript to act as domain-level materialized views that get built & cached on-demand...but never yet tried to build anything š
I've not benchmarked it, but I suspect it's slightly faster to do things that way round. Although if you ever want to be able to annotate or otherwise link together specific events then using discrete entities is a very wise strategy, and probably the better default choice!
What would be the advantage of storing events in Crux vs plain Kafka? I totally get the use case of lightweight Crux nodes as a downstream consumer of the event log though š
I notice that at some point a malformed doc seems to have gotten transacted and now I see Transaction function failed when originally evaluated
every time a node starts up, I guess when it indexes that bad transaction. Is it recommended to just evict it?
You shouldn't need to evict anything. When you say "every time a node starts up" do you mean from scratch? Or is that happening with persisted Rocks indexes?
yes, from scratch -- newly created docker containers. I'm still using the 20.11 RC, can't recall if my previous test cluster was on this version or the old.
I believe this is the offending tx fn:
{:crux.db/id :assoc :crux.db/fn '(fn [ctx eid attr new-value] (let [db (crux.api/db ctx) entity (crux.api/entity db eid)] [[:crux.tx/put (assoc entity attr new-value)]]))}
it looks like it was called on a nil
entity, so the resulting doc was just the assoc'd value without a crux.db/id.
the symptom is typical of broken indexing; I have 6 nodes running and all of them seem to be read-only. nothing else in the logs so I think it has to be crux.
just restarted and reproduced. Here is the interesting log portion:
2020-12-04 23:47:43 at java.base/java.lang.Thread.run(Thread.java:832) 2020-12-04 23:47:43 at clojure.lang.AFn.run(AFn.java:22) 2020-12-04 23:47:43 at crux.tx$__GT_polling_tx_consumer$fn__68349.invoke(tx.clj:493) 2020-12-04 23:47:43 at crux.tx$index_tx_log.invoke(tx.clj:448) 2020-12-04 23:47:43 at crux.tx$index_tx_log.invokeStatic(tx.clj:450) 2020-12-04 23:47:43 at crux.tx$index_tx_log$fn__68328.invoke(tx.clj:458) 2020-12-04 23:47:43 at crux.tx$index_tx_log$fn__68328$fn__68333.invoke(tx.clj:470) 2020-12-04 23:47:43 at crux.tx.InFlightTx.abort(tx.clj:396) 2020-12-04 23:47:43 at crux.tx$index_docs.invoke(tx.clj:255) 2020-12-04 23:47:43 at crux.tx$index_docs.invokeStatic(tx.clj:257) 2020-12-04 23:47:43 at crux.error$illegal_arg.invoke(error.clj:3) 2020-12-04 23:47:43 at crux.error$illegal_arg.invokeStatic(error.clj:7) 2020-12-04 23:47:43 at crux.error$illegal_arg.invoke(error.clj:3) 2020-12-04 23:47:43 at crux.error$illegal_arg.invokeStatic(error.clj:12) 2020-12-04 23:47:43 Exception in thread "crux-polling-tx-consumer" crux.IllegalArgumentException: Missing required attribute :crux.db/id 2020-12-04 23:47:43 2020-12-05T07:47:43.667Z 73670e9a8d20 WARN [crux.tx:326] - Transaction function failed when originally evaluated: #crux/id fb5c548c8f8558c093ed35aa916c97e92b798c49 nil {:crux.db.fn/exception crux.IllegalArgumentException, :crux.db.fn/message "invalid tx-op: invalid entity id", :crux.db.fn/ex-data {:crux.error/error-type :illegal-argument, :crux.error/error-key :invalid-tx-op, :crux.error/message "invalid tx-op: invalid entity id", :op [:crux.tx/put {:entry/fresh? false}]}} 2020-12-04 23:47:24 2020-12-05T07:47:24.162Z 73670e9a8d20 INFO [crux.tx:326] - Started tx-consumer 2020-12-04 23:47:24 2020-12-05T07:47:24.042Z 73670e9a8d20 INFO [com.zaxxer.hikari.HikariDataSource:82] - HikariPool-1 - Start completed. 2020-12-04 23:47:23 2020-12-05T07:47:23.761Z 73670e9a8d20 INFO [com.zaxxer.hikari.HikariDataSource:80] - HikariPool-1 - Starting...
being that the message is only a WARN I'm not sure if this is actually my problem, as it implies that it's routine.. but I'm still suspicious. regardless I think crux should never break like this, but if it does it should at least do everything it can to be loud about it
From what you've shared (thanks) I agree Crux shouldn't be breaking like this - indexing should continue despite errors like this. I will try to reproduce it. We do have various tests for this class of error but perhaps we've missed an edge case (particularly for "still working after errors"): https://github.com/juxt/crux/blob/master/crux-test/test/crux/tx_test.clj#L593-L644
please, try to enjoy your weekend instead :) Really the more pressing issue on my mind is: what could I do to handle these types of situations (db going silently read-only) happening in production? Maybe crux exposes some metrics that could work? Or an await-tx heartbeat from my application?
and after detection, what could I do to immediately remediate the issue? evict + restart? unfortunately a decent amount of downtime from how long everything takes to start up
> what could I do to handle these types of situations (db going silently read-only) happening in production?
Really this should never happen. The most comprehensive heartbeat to check end-to-end functionality is to run an entity
lookup on some well-known (and guaranteed to not change) entity
It's not clear to me how to remedy this particular situation yet (I don't think a normal evict could do it...), but I have managed to reproduce it locally now. We will have it fixed tomorrow hopefully and the fix will definitely be included in the imminent release. Thanks for your patience and for the report!
Ah, good to hear it! I actually found another symptom: in the crux http console, the /_crux/sync
endpoint will never resolve. Found that out while looking for a way to do healthchecks.
> Really this should never happen
in all honesty I've been trained to never take this line seriously š I think a heartbeat is probably good enough for now though. It does seem challenging for crux itself to preemptively detect such situations