This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2016-01-26
Channels
- # aatree (6)
- # admin-announcements (2)
- # aws (1)
- # beginners (46)
- # boot (341)
- # braid-chat (3)
- # cider (20)
- # clara (1)
- # cljs-dev (3)
- # cljsjs (7)
- # cljsrn (73)
- # clojure (63)
- # clojure-art (4)
- # clojure-dev (3)
- # clojure-russia (83)
- # clojurescript (77)
- # core-async (12)
- # core-matrix (2)
- # cursive (9)
- # data-science (1)
- # datomic (14)
- # dirac (3)
- # emacs (9)
- # hoplon (3)
- # immutant (8)
- # ldnclj (80)
- # luminus (13)
- # mount (7)
- # off-topic (4)
- # om (288)
- # onyx (20)
- # overtone (5)
- # pedestal (10)
- # perun (3)
- # proton (103)
- # quil (1)
- # re-frame (3)
- # reagent (13)
- # ring-swagger (7)
- # yada (43)
@rob it’s possible, but you would have to write an alternate implementation. See https://github.com/onyx-platform/onyx/blob/0.8.x/src/onyx/state/state_extensions.clj and https://github.com/onyx-platform/onyx/blob/0.8.x/src/onyx/state/log/bookkeeper.clj
had a failure in production today @lucasbradstreet @michaeldrogalis ... kafka brokers restarted and onyx wedged... it took changing the kafka consumer id to get things running again
any clues about how to debug / post-mortem this ?
@mccraigmccraig: sorry to hear that. I have a feeling that I know why it didn't initially recover. I'll post that issue in a minute. What I don't understand is why you needed a new group-id when you started a new job.
@mccraigmccraig: so basically, onyx stopped processing, you submitted a kill-job to stop the job and a new submit-job to restart with the same group-id?
@mccraigmccraig: I have a feeling this is why the job wasn't auto killed (or your peer didn't restart, if you were using restart-pred-fn) https://github.com/onyx-platform/onyx/issues/435
It would help if you could have a look through your logs for any relevant exceptions.
I've created an issue to create a test for this: https://github.com/onyx-platform/onyx-kafka/issues/15
@lucasbradstreet: i never submitted a kill-job... just restarted the onyx processes... perhaps that was the prob ?
Possibly, maybe the job killed itself? Maybe it would've come back up if you'd submitted a new job using the same group-id? If you could load up the dashboard and dump the log that might help.
the job didn't kill itself - it came back up when the process was restarted - but no messages were delivered, so i tried a new job-id with same consumer-id (still wedged) and then a new job-id and new consumer-id at which point messages were delivered
Okay, thanks. I'll try to reproduce it locally and will let you know if I have trouble and need more information or help.
Onyx 0.8.6 is out. See 0.8.5 in the changelog for most of the relevant changes, since we had to do two releases today https://github.com/onyx-platform/onyx/blob/0.8.x/changes.md#086
@mccraigmccraig: I fixed that bug in onyx-kafka that I mentioned
It should be able recover from broker issues if you use :onyx/restart-pred-fn on your input tasks
Highly recommend all users upgrade to 0.8.6, as we have fixed lots of issues
@lucasbradstreet: wow.... this is excellent news.
^ The above fixes are the product of Jepsen testing - so, those kind of fixes.