onyx 2016-01-26 | Slack Archive

Is it possible (or, not painful) to store the changelog in Kafka instead of BookKeeper?

@rob it’s possible, but you would have to write an alternate implementation. See https://github.com/onyx-platform/onyx/blob/0.8.x/src/onyx/state/state_extensions.clj and https://github.com/onyx-platform/onyx/blob/0.8.x/src/onyx/state/log/bookkeeper.clj

mccraigmccraig10:01:23

had a failure in production today @lucasbradstreet @michaeldrogalis ... kafka brokers restarted and onyx wedged... it took changing the kafka consumer id to get things running again

mccraigmccraig10:01:43

any clues about how to debug / post-mortem this ?

lucasbradstreet11:01:53

@mccraigmccraig: sorry to hear that. I have a feeling that I know why it didn't initially recover. I'll post that issue in a minute. What I don't understand is why you needed a new group-id when you started a new job.

lucasbradstreet11:01:34

@mccraigmccraig: so basically, onyx stopped processing, you submitted a kill-job to stop the job and a new submit-job to restart with the same group-id?

lucasbradstreet11:01:21

@mccraigmccraig: I have a feeling this is why the job wasn't auto killed (or your peer didn't restart, if you were using restart-pred-fn) https://github.com/onyx-platform/onyx/issues/435

lucasbradstreet11:01:05

It would help if you could have a look through your logs for any relevant exceptions.

lucasbradstreet11:01:57

I've created an issue to create a test for this: https://github.com/onyx-platform/onyx-kafka/issues/15

mccraigmccraig12:01:55

@lucasbradstreet: i never submitted a kill-job... just restarted the onyx processes... perhaps that was the prob ?

lucasbradstreet12:01:44

Possibly, maybe the job killed itself? Maybe it would've come back up if you'd submitted a new job using the same group-id? If you could load up the dashboard and dump the log that might help.

mccraigmccraig12:01:47

the job didn't kill itself - it came back up when the process was restarted - but no messages were delivered, so i tried a new job-id with same consumer-id (still wedged) and then a new job-id and new consumer-id at which point messages were delivered

lucasbradstreet12:01:38

Okay, thanks. I'll try to reproduce it locally and will let you know if I have trouble and need more information or help.

mccraigmccraig12:01:23

thanks @lucasbradstreet

lucasbradstreet20:01:22

Onyx 0.8.6 is out. See 0.8.5 in the changelog for most of the relevant changes, since we had to do two releases today https://github.com/onyx-platform/onyx/blob/0.8.x/changes.md#086

lucasbradstreet20:01:26

@mccraigmccraig: I fixed that bug in onyx-kafka that I mentioned

lucasbradstreet20:01:30

It should be able recover from broker issues if you use :onyx/restart-pred-fn on your input tasks

lucasbradstreet20:01:06

Highly recommend all users upgrade to 0.8.6, as we have fixed lots of issues

thomas21:01:33

@lucasbradstreet: wow.... this is excellent news.

michaeldrogalis21:01:50

^ The above fixes are the product of Jepsen testing - so, those kind of fixes.

2016-01-26

Channels