Fork me on GitHub

Is it possible (or, not painful) to store the changelog in Kafka instead of BookKeeper?


had a failure in production today @lucasbradstreet @michaeldrogalis ... kafka brokers restarted and onyx wedged... it took changing the kafka consumer id to get things running again


any clues about how to debug / post-mortem this ?


@mccraigmccraig: sorry to hear that. I have a feeling that I know why it didn't initially recover. I'll post that issue in a minute. What I don't understand is why you needed a new group-id when you started a new job.


@mccraigmccraig: so basically, onyx stopped processing, you submitted a kill-job to stop the job and a new submit-job to restart with the same group-id?


@mccraigmccraig: I have a feeling this is why the job wasn't auto killed (or your peer didn't restart, if you were using restart-pred-fn)


It would help if you could have a look through your logs for any relevant exceptions.


@lucasbradstreet: i never submitted a kill-job... just restarted the onyx processes... perhaps that was the prob ?


Possibly, maybe the job killed itself? Maybe it would've come back up if you'd submitted a new job using the same group-id? If you could load up the dashboard and dump the log that might help.


the job didn't kill itself - it came back up when the process was restarted - but no messages were delivered, so i tried a new job-id with same consumer-id (still wedged) and then a new job-id and new consumer-id at which point messages were delivered


Okay, thanks. I'll try to reproduce it locally and will let you know if I have trouble and need more information or help.


Onyx 0.8.6 is out. See 0.8.5 in the changelog for most of the relevant changes, since we had to do two releases today


@mccraigmccraig: I fixed that bug in onyx-kafka that I mentioned


It should be able recover from broker issues if you use :onyx/restart-pred-fn on your input tasks


Highly recommend all users upgrade to 0.8.6, as we have fixed lots of issues


@lucasbradstreet: wow.... this is excellent news.


^ The above fixes are the product of Jepsen testing - so, those kind of fixes.