Fork me on GitHub
#onyx
<
2018-08-05
>
sparkofreason14:08:10

Back on the exception/restart topic: happened again, lifecycle exception handler was called several times, and onyx gave a warning a few times in there as well. The last thing was a the onyx warning "Caught exception inside task lifecycle :lifecycle/offer-heartbeats.", and then everything shut down.

sparkofreason14:08:24

And as before, after the peers are restarted the job does not restart on its own, and requires manual resubmission.

lucasbradstreet17:08:21

Thanks. There must be a bug in the supervision where handle-exception isn’t invoked under certain circumstances (probably in offer-heartbeats)

lucasbradstreet17:08:45

I assume handle-exception is set for :all and always returns :restart?

sparkofreason17:08:40

I believe so, code above, let me know if I missed something. It did actually restart successfully several times.

lucasbradstreet17:08:08

Looks right to me. Just double checking stuff before going digging.

lucasbradstreet17:08:46

Do you know whether the old job moved to the killed key in the cluster replica? 99% sure it is.

sparkofreason21:08:47

How would I check?

lucasbradstreet21:08:37

On my phone so I haven’t checked this. If you have onyx peer http query you can query /replica and see if the latest job-is is under killed-jobs

lucasbradstreet21:08:57

Either that or look under killed-jobs however you’ve played back the log for diagnostics