onyx 2018-01-04 | Slack Archive

Hi All

Sorry to be jumping straight to a question, I need some advice about using onyx. I will drop it here hopefully someone can help me out.

Mehdi Avdi00:01:33

I want to poll multiple rest APIs every second, process the results and send them to a Kafka topic. I know I can submit a task with a job but is there a more robust way of generating jobs? Or is this what I one should be doing (using a timer to submit jobs)

michaeldrogalis00:01:31

@mehdi.avdi I'd use a single job with a plugin that repeatedly polls on your timer.

michaeldrogalis00:01:08

The nature of the input is a little awkward in general -- hence why I suggest writing your own input plugin. After that everything should work smoothly with a single job routing data to Kafka.

Mehdi Avdi00:01:38

A plugin that repeatedly sends segments downstream on a timer config?

Mehdi Avdi00:01:08

Segments practically just containing a timestamp and then take it from there

michaeldrogalis00:01:04

@mehdi.avdi I'd have the plugin poll the rest API, yeah. You can specify how frequently the plugin should poll for input and use that to control the rate.

michaeldrogalis00:01:20

If you write a solid input plugin, the rest of the system can turn into a nice, unidirectional streaming design.

Mehdi Avdi00:01:08

Thank you, I will have some reading to do but this is very helpful.

michaeldrogalis00:01:10

@mehdi.avdi No problem, feel free to ask more questions here post-reading.

Mehdi Avdi00:01:18

Cheers

pfeodrippe02:01:04

About the Jason talk at Clojure eXchange 2017 "How I Bled All Over Onyx" (starting at 11 min) (https://skillsmatter.com/skillscasts/10939-how-i-bled-all-over-onyx), was the app he was doing really not at the Onyx scope? Or there is some kind of misconception by him of what part of the configuration should be tuned? I'm studying to use Onyx for our IoT sensors data processing and be scalable enough... I just think Kafka stream is very verbose but I like the not-a-separate-cluster-needed approach of it

michaeldrogalis02:01:49

@pfeodrippe Processing huge messages with Onyx isn't a great idea. There's nothing inherent to Onyx's design that makes it weak under high loads. There are some knobs to tune as volume grows, but it's design is solid high perf/low latency applications like IoT processing.

michaeldrogalis02:01:32

Having highly volatile message sizes is bad news regardless of what technology you're doing streaming with.

Quan Nguyen06:01:39

dumb question: I'm going thru the learn-onyx exercises (https://github.com/onyx-platform/learn-onyx/blob/master/src/workshop/challenge_4_1.clj) and in the lifecycle i'm writing, i'm encountering a whole bunch of event maps without

:onyx.core/batch

key, even though i specified it as :lifecycle/after-batch. Is this expected, or is something wrong with my code? Relevant code:

(defn store-max [event lifecycle]
  "Work in progress..."
  (println "Event:" (keys event))
  {})

(def store-max-lifecycle
  {:lifecycle/after-batch store-max})

(defn build-lifecycles []
  [{:lifecycle/task :identity
    :lifecycle/calls ::store-max-lifecycle
    :onyx/doc "Stores the highest encountered value in state atom"}

;; I/O boilerplate from previous exercises...

Thanks for your help!

jasonbell08:01:36

@pfeodrippe “was the app he was doing really not at the Onyx scope” Was it out of scope, not really, it could have been done. Was the initial design wrong, yes, in terms of how Onyx works. Onyx would be perfect for IoT as @michaeldrogalis has already pointed out. Message volatility will never help matters. Kafka streams is verbose and compact but if the design is wrong you’ll still hit issues. Hindsight is a wonderful thing though…..

ninja09:01:33

Hi, folks! I have been working with onyx for a while now and my workflow grew quite large. At this point the dashboard's static illustration of the workflow is insufficient. It would be much appreciated to have something like a recorder that tracks the path of a segment through the workflow graph to enable some kind of introspection after a job finished without the need of putting log-functions at several locations in the code. I thought of something that does some kind of workflow and catalog rewriting to create intermediate nodes that can store whole segments as well as information regarding their source and target node within the original graph. This information could then be queried and visualized (either as a simple table, an annotated graph etc.) So here is the question: Is there already a tool/library that does something similar? Or is there a common practice that should be applied instead to achieve the same? Note, that this should be for dev-purposes only.

pfeodrippe13:01:26

@mablack @jasonbell Understood, very much thank you, guys 😃

jasonbell13:01:43

@pfeodrippe The main crux of the talk is that Jase tried to do this mad customer request with Onyx. To be honest I learned a lot more about the Onyx internals that I bargained for and both @michaeldrogalis and @lucasbradstreet were gracious with their time and patience as I went through the scenarios. It also taught us about the Docker/DCOS things and Kafka doesn’t always behave how you want it to, schedulers detatch and resassign randomly never a good thing for assigned brokers etc. The talk was really the journey, the coal face and the eventual solution. I’ve still an Onyx talk under consideration for Strata London in May but I’m fairly sure that sadly won’t happen as I’m confirmed to talk on self training AI with Kafka and DeepLearning4J.

jasonbell13:01:00

And as you might have noticed @pfeodrippe I don’t take my talks too seriously 🙂

jasonbell13:01:13

Thanks for asking a thoughtful question, very much appreciated.

pfeodrippe13:01:29

@jasonbell Hope you give a talk here at Brazil some day heheaheehu I'll be there. Thanks o/

jasonbell13:01:06

👍 Never say never 🙂

michaeldrogalis16:01:12

@quan.ngoc.nguyen Ahh - this is our bad. The latest release of Onyx removed that key from the event map. I'm not sure how our build of learn-onyx didn't red flag that commit. Call keys on the event map to see the additions.

michaeldrogalis16:01:23

Ill make a note to circle back later today and update it. Thanks for the report.

michaeldrogalis16:01:23

@atwrdik I don't know of anything out there at the moment for Onyx specifically, but I always wanted to see what it would be like to integrate with Zipkin. Same idea as what Erlang supports natively -- using tracer IDs to correlate distributed message paths.

lucasbradstreet23:01:43

@michaeldrogalis that’s a bit strange. We didn’t remove onyx.core/batch

michaeldrogalis23:01:12

Didn't we? I'm misremembering what we removed. onyx.core/results, then?

lucasbradstreet23:01:16

Yes

lucasbradstreet23:01:38

Turned that into onyx.core/write-batch

michaeldrogalis23:01:58

Weird. @quan.ngoc.nguyen Can you invoke (keys) on those maps and check out what's there? Perhaps you're using the wrong param?

Quan Nguyen23:01:51

@michaeldrogalis I'm not at home right now but I can tell u that I remember seeing several event maps with :onyx.core/batch key in them (and when printed out they gave the expected 100 numbers). However after processing all the segments, the lifecycle got called a bunch more times. I think the event maps in this case were the same as the previous ones , except they were just missing :onyx.core/batch. I can attach my code tonight, maybe that will help?

michaeldrogalis23:01:44

@quan.ngoc.nguyen Oh -- Im not certain that that key is supplied if no segments were read, I'd need to check on that

Quan Nguyen23:01:46

@michaeldrogalis if my understanding is correct, the after-batch lifecycle should only be called if the task receives segments right? In this case it seems like it gets called a bunch of extra times for some reason (might just be something wrong with my code so I'll take another look tonight)

michaeldrogalis23:01:37

No, it'll still be called regardless.

michaeldrogalis23:01:52

The batch just might be empty, or the key may be absent.

2018-01-04

Channels