This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2016-09-28
Channels
- # arachne (2)
- # aws (5)
- # aws-lambda (5)
- # beginners (4)
- # boot (25)
- # cljs-dev (270)
- # cljsjs (1)
- # cljsrn (72)
- # clojars (5)
- # clojure (201)
- # clojure-belgium (5)
- # clojure-brasil (4)
- # clojure-italy (2)
- # clojure-korea (2)
- # clojure-russia (24)
- # clojure-spec (24)
- # clojure-uk (22)
- # clojurebridge (1)
- # clojurescript (125)
- # cloverage (3)
- # cursive (41)
- # datomic (37)
- # dirac (4)
- # emacs (2)
- # hoplon (421)
- # lein-figwheel (1)
- # leiningen (5)
- # luminus (2)
- # mount (1)
- # off-topic (18)
- # om (44)
- # om-next (4)
- # onyx (44)
- # pedestal (3)
- # proton (9)
- # re-frame (21)
- # reagent (21)
- # ring-swagger (12)
- # specter (9)
- # sql (2)
- # untangled (62)
- # vim (16)
@aspra its probably because of aeron running out of shared mem .
@aspra http://www.onyxplatform.org/docs/user-guide/0.9.10/#_aeron_mediadriver_crashes_the_jvm_with_sigbus
@zamaterian strange. we have given it 1G
How many nodes are you running on?
1G is generally enough
Ah, so the reason it's not enough is that the shm requirements scale up with the number of nodes
Since it's essentially ring buffer space for the connections between the nodes
I think we should include a description of how that scales
You’d be using the memory anyway, if it was in the JVM, but it’s certainly easier to just stick a largish -Xmx on the JVM and be done with it
If you’re on low memory nodes with fast SSDs, you also have the option of just using disk space instead of shm memory. It can hurt performance slightly, but would allow you to scale more easily.
Let me calculate it and get back to you. I need to reread the aeron docs
as far as I can tell, it should be 3*term buffer length (16MB) = 48MB per connection, but that wouldn’t make sense because it should only be using about 500MB if every node connects to every other node
Ah, plus another copy for each publication on the other end
so 500MB*2
So you basically just hit the limit once you add some metadata
You probably need an additional safety factor in case a peer gets rebooted and the old one isn't cleaned up yet
@lucasbradstreet thx for your excellent explanation!
I’ll add it to the docs
How do I detect if job has failed, from within a lifecycle function ? Currently its the after-task-stop lifecycle i’m using.
You can get at the replica via (:onyx.core/replica event)
, then (some #{(:onyx.core/job-id event} (:killed-jobs @replica))
However, we do actually have some more information that we should assoc into the event that we pass into after-task-stop. I might add it to the next version of Onyx that we are releasing ASAP if @michaeldrogalis is with me on it
super nice, as usual the 🍺 is on me 🙂
@lucasbradstreet Good to me.
:onyx.core/scheduler-event? It’ll only be added when the task is stopped, but I think that’s the only time you want it. It’ll use the value we’re already using for triggers
@michaeldrogalis when you have a moment, can we discuss version schemes?
Yeah, that's a good thing to add to the Event. And yes, give me 10.
@lucasbradstreet btw : the solution with checking the killed-job for with the running job-id don’t work, since the job-id is not yet present in the killed-job list.
Ahh. Yes, ever since we switched to the peer-group-manager we update the peer’s replica after making the log call
No good reason to do that, so I will switch the order. You’ll have this new feature when we release anyway. Thanks for the heads up
@zamaterian I just released 0.9.11-alpha1, with the :onyx.core/scheduler-event addition if you would like to try it. Please note that we now enforce new tenancy-ids when you upgrade/downgrade onyx
learn-onyx has been upgraded, removing the challenge with :onyx/bulk?
, replacing it with a new challenge for :onyx/batch-fn?
How does enforcement of new tenancy-ids play in hand with rolling upgrades of peers ?
You only need a fresh tenancy ID if you change the Onyx version.
thx 🙂 for the clarification
For sure. This is a good release, as @lucasbradstreet said, it will eliminate an entire class of errors that are encountered in the wild.