Fork me on GitHub
#onyx
<
2016-09-28
>
zamaterian10:09:42

@aspra its probably because of aeron running out of shared mem .

aspra11:09:54

@zamaterian strange. we have given it 1G

lucasbradstreet11:09:41

How many nodes are you running on?

lucasbradstreet11:09:49

1G is generally enough

aspra11:09:59

10 nodes, 13 peers each

aspra11:09:12

too many peers?

lucasbradstreet11:09:30

Ah, so the reason it's not enough is that the shm requirements scale up with the number of nodes

lucasbradstreet11:09:51

Since it's essentially ring buffer space for the connections between the nodes

aspra11:09:04

Thats a bit inconvenient with autoscaling I suppose

lucasbradstreet11:09:12

I think we should include a description of how that scales

lucasbradstreet11:09:01

You’d be using the memory anyway, if it was in the JVM, but it’s certainly easier to just stick a largish -Xmx on the JVM and be done with it

aspra11:09:43

ok. So how much would you use for 10 nodes?

lucasbradstreet11:09:03

If you’re on low memory nodes with fast SSDs, you also have the option of just using disk space instead of shm memory. It can hurt performance slightly, but would allow you to scale more easily.

lucasbradstreet11:09:13

Let me calculate it and get back to you. I need to reread the aeron docs

aspra11:09:29

ok I will have a look as well. thanks

lucasbradstreet11:09:48

as far as I can tell, it should be 3*term buffer length (16MB) = 48MB per connection, but that wouldn’t make sense because it should only be using about 500MB if every node connects to every other node

lucasbradstreet11:09:58

Ah, plus another copy for each publication on the other end

lucasbradstreet11:09:27

So you basically just hit the limit once you add some metadata

aspra11:09:34

ok thanks. Will double it an try again

lucasbradstreet11:09:38

You probably need an additional safety factor in case a peer gets rebooted and the old one isn't cleaned up yet

zamaterian11:09:55

@lucasbradstreet thx for your excellent explanation!

lucasbradstreet11:09:35

I’ll add it to the docs

zamaterian15:09:36

How do I detect if job has failed, from within a lifecycle function ? Currently its the after-task-stop lifecycle i’m using.

lucasbradstreet15:09:23

You can get at the replica via (:onyx.core/replica event), then (some #{(:onyx.core/job-id event} (:killed-jobs @replica))

lucasbradstreet15:09:54

However, we do actually have some more information that we should assoc into the event that we pass into after-task-stop. I might add it to the next version of Onyx that we are releasing ASAP if @michaeldrogalis is with me on it

zamaterian15:09:57

super nice, as usual the 🍺 is on me 🙂

lucasbradstreet15:09:25

:onyx.core/scheduler-event? It’ll only be added when the task is stopped, but I think that’s the only time you want it. It’ll use the value we’re already using for triggers

lucasbradstreet15:09:16

@michaeldrogalis when you have a moment, can we discuss version schemes?

michaeldrogalis15:09:26

Yeah, that's a good thing to add to the Event. And yes, give me 10.

zamaterian16:09:08

@lucasbradstreet btw : the solution with checking the killed-job for with the running job-id don’t work, since the job-id is not yet present in the killed-job list.

lucasbradstreet16:09:27

Ahh. Yes, ever since we switched to the peer-group-manager we update the peer’s replica after making the log call

lucasbradstreet16:09:22

No good reason to do that, so I will switch the order. You’ll have this new feature when we release anyway. Thanks for the heads up

lucasbradstreet19:09:36

@zamaterian I just released 0.9.11-alpha1, with the :onyx.core/scheduler-event addition if you would like to try it. Please note that we now enforce new tenancy-ids when you upgrade/downgrade onyx

michaeldrogalis19:09:31

learn-onyx has been upgraded, removing the challenge with :onyx/bulk?, replacing it with a new challenge for :onyx/batch-fn?

zamaterian19:09:55

How does enforcement of new tenancy-ids play in hand with rolling upgrades of peers ?

michaeldrogalis19:09:40

You only need a fresh tenancy ID if you change the Onyx version.

zamaterian19:09:23

thx 🙂 for the clarification

michaeldrogalis19:09:44

For sure. This is a good release, as @lucasbradstreet said, it will eliminate an entire class of errors that are encountered in the wild.