This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2017-10-10
Channels
- # aleph (4)
- # beginners (32)
- # cider (12)
- # cljs-dev (56)
- # cljsrn (7)
- # clojars (3)
- # clojure (165)
- # clojure-dev (33)
- # clojure-germany (1)
- # clojure-italy (27)
- # clojure-russia (7)
- # clojure-spec (24)
- # clojure-uk (62)
- # clojurescript (37)
- # core-async (7)
- # core-matrix (1)
- # cursive (9)
- # data-science (8)
- # datomic (8)
- # duct (4)
- # events (1)
- # figwheel (7)
- # flambo (3)
- # fulcro (43)
- # hoplon (25)
- # jobs-discuss (8)
- # lein-figwheel (4)
- # luminus (2)
- # off-topic (35)
- # om (8)
- # om-next (3)
- # onyx (30)
- # pedestal (62)
- # portkey (2)
- # protorepl (2)
- # re-frame (40)
- # reagent (9)
- # shadow-cljs (123)
- # specter (30)
- # sql (22)
- # testing (1)
- # uncomplicate (40)
- # unrepl (3)
- # vim (13)
- # yada (5)
term buffer length is the best place to start
hmm, but you do have space in /dev/shm
so that’s really weird
I think this is related to your big log
Could it have the wrong permissions to write to it? The whole picture doesn’t make a lot of sense.
There seems to be some thrashing component. Most of the time it is as I posted:
tmpfs 3.9G 105M 3.8G 3% /dev/shm
but I just caught it like this: (this is with -Daeron.term.buffer.length=33554432
): tmpfs 3.9G 3.9G 48M 99% /dev/shm
and errors are triggered, but then it cleans itself right back up.Yeah, that makes a lot of sense. I would look out for other errors in your logs
The odd thing is that things run fine on local machine, using internal zookeeper, processing thousands of messages a minute, and when we deploy and use a 5-node zookeeper, it seems like aeron buffer balloons until job dies. So this is our first go at running this in the server environment, and not sure what logs are normal. Things are littered with low /dev/shm space and unavailable network image and some peers not responding to heartbeat (despite being on single node), but nothing stands out.
Does your job have a lot of tasks? Or lots of peers on each task? Seems like a lot of channels are being opened, which may be helped by reducing the term buffer size
9 tasks, 11 virtual peers:
{:workflow [[:in :conform-health-check-msg]
[:conform-health-check-msg :latest-status]
[:latest-status :update-state-graph]
[:in :update-state-graph]
[:update-state-graph :distribute-statuses]
[:distribute-statuses :save-component-status]
[:save-component-status :out]
[:update-state-graph :build-v0-json]
[:build-v0-json :v0-json-out]]
K that’s not too bad
Hi all, i have some question about zookeeper and peer state, if i have 5 zookeeper' s machines and 1 of then going down, the peer stop working?
@lellis No, ZK will remain available as long as there’s a cluster majority, so with 5 ZK nodes you can lose 2 nodes and still remain available. 2/3
split.
nice! Ty! @gardnervickers
in the docs, its states that for a perf boost you disable assertions, but i can't get it to work
in when i run it as a jar, i get:
Caused by: java.lang.IllegalStateException: Can't change/establish root binding of: *assert* with set
@chrisblom (get-in event [:onyx.core/task-information :job-metadata])
should do it I think
Not sure about assertions - I’ve never seen Clojure throw that exception before.
Anyone else know?
@chrisblom one sec, I’ll get you the assert answer
@chrisblom are you uberjar’ing?
@lucasbradstreet yes, only AOTing the main ns
btw, i've also tried (alter-var-root #'*assert* (constantly false))
, this throw something like java.lang.IllegalStateException: Can't change/establish root binding of: assert
@chrisblom set :global-vars {*assert* false}
in the profile where you uberjar
@chrisblom it’s enough to be set during the uberjar, where things are going wrong is that you can’t set it when you start up in the uberjar