onyx 2016-09-19 | Slack Archive

zamaterian07:09:53

How do I relaunch a job that was killed, using the same catalog and job-id ? So that the job continues from where it left off ?

lucasbradstreet08:09:50

@zamaterian: you resubmit using the same process you used before, or you read all the job data back from zookeeper and then rebuild the job from there. The resuming is the hard part, and it depends on what input plugin you're using

zamaterian13:09:05

@lucasbradstreet just to clarify what is the difference between at job that not completed (where the onyx peers has been killed ) and a job that has been killed by an exception (ex. lost connection). The former is able to resume when starting a new onyx peer. The later is not able to continue by submitting the same jobid along with the same catalog. ? Should it not be possible to resubmit/restart the killed job, - since the catalog hasn’t been changed and the jobid is the same!.. As I understand it all its almost the same case af the former except the killed-jobs state istead of jobs state ? btw: uses the onyx-sql.

vladclj13:09:12

Hi guys, we use [org.onyxplatform/onyx-kafka-0.8 "0.9.7.0"] and for exemple we push 10 message to kafka but onyx proccess only 9

{:kafka/zookeeper "reg1.mk.prod:2181,reg2.mk.prod:2181",
  :onyx/plugin :onyx.plugin.kafka/read-messages,
  :onyx/medium :kafka,
  :kafka/offset-reset :smallest,
  :onyx/type :input,
  :onyx/name :read-mk-product,
  :kafka/topic "test_mk_product",
  :kafka/group-id "raker-dev-koles-machine",
  :onyx/max-pending 10,
  :onyx/max-peers 1,
  :onyx/doc "Reads messages from a Kafka topic",
  :onyx/batch-size 10,
  :kafka/deserializer-fn :raker.product.functions/avro->mk-product,
  :kafka/wrap-with-metadata? true}

michaeldrogalis16:09:01

@zamaterian I believe Lucas slightly misspoke about how idempotent job submission works. Repeated invocations of submit-job with the same ID will make it so that only one job ends up starting. Invoking submit-job after a job has been completed or killed has no effect. The purpose of the idempotency is to make sure when you're trying to get your job off the ground the first time, transit network problems don't cause duplicate jobs.

michaeldrogalis16:09:00

If you're restarting a known killed or completed job, I would recommend using the log subscriber to monitor the replica and ensure the job is down, then submit under a new ID with the same workflow/catalog/etc.

michaeldrogalis16:09:17

As far as resuming progress, if the plugin supports checkpointing (all the production ones do, including onyx-sql), you can configure the checkpoint value to be agnostic to the job ID. Pretty sure onyx-sql supports this, but if it doesn't we should add a patch for it

michaeldrogalis16:09:56

For example, onyx-datomic lets you checkpoint the progress of how much of the tx-log it read to an arbitrary location into ZooKeeper so a completely different job can pick up from where the original left off. Does that make sense?

smw22:09:21

(defproject tmp-onyx-app "0.1.0-SNAPSHOT"
  :description ""
  :url ""
  :license {:name ""
            :url ""}
  :dependencies [[aero "1.0.0-beta2"]
                 [org.clojure/clojure "1.8.0"]
                 [org.clojure/tools.cli "0.3.5"]
                 [org.onyxplatform/onyx "0.9.10"]
                 [org.onyxplatform/lib-onyx "0.9.10.0"]]
  :source-paths ["src"]

  :profiles {:dev {:jvm-opts ["-XX:-OmitStackTraceInFastThrow"]
                   :global-vars {*assert* true}}
             :dependencies [[org.clojure/tools.namespace "0.2.11"]
                            [lein-project-version "0.1.0"]]
             :uberjar {:aot [lib-onyx.media-driver
                             tmp-onyx-app.core]
                       :uberjar-name "peer.jar"
                       :global-vars {*assert* false}}})

smw22:09:51

This is the project.clj that