Fork me on GitHub
#onyx
<
2016-08-31
>
mariusz_jachimowicz13:08:47

I have applied zooming for workflow diagram, please review https://github.com/lbradstreet/onyx-visualization/pull/4

aengelberg16:08:41

@lucasbradstreet any update on my above questions? 🙂

michaeldrogalis17:08:42

@aengelberg He might be asleep. Everyone once in a while he passes out "early" for his time, heh. Will have him get a next Onyx dashboard out when he's up.

surreal.analysis18:08:07

I’m working my way through the Onyx workshop, and a little confused at a particular point. I’m on challenge 4.2, which is designed to demonstrate how to use a local atom to track state. Relevant parts of the code are:

(def state (atom Integer/MIN_VALUE))
(defn find-max [event lifecycle]
  (reset! state (max (apply max (map :n (map :message (:onyx.core/batch event))) @state)))
  {})
(defn log-max [event lifecycle]
  (println @state)
  {})
(def logger-lifecycle
  {:lifecycle/after-batch find-max
   :lifecycle/after-task-stop log-max})
But in find-max, @state is always nil the first call (after the first batch). I was able to fix it by changing find-max to
(defn find-max [event lifecycle]
  (reset! state (max (apply max (map :n (map :message (:onyx.core/batch event))))
                     (if @state
                       @state
                       Integer/MIN_VALUE)))
  {})
but I’m still not sure why the first find-max is failing

michaeldrogalis18:08:45

@surreal.analysis Do you think there's a problem with the code in learn-onyx, or more of a Clojure problem? Im looking at the code, and the atom is set to nil to start out with: https://github.com/onyx-platform/learn-onyx/blob/master/src/workshop/challenge_4_2.clj#L47

surreal.analysis18:08:08

I’m not sure. After running (def state (atom Integer/MIN_VALUE)) @state does return the minimum integer value when I’m in the workshop.challenge-4-2 ns in the REPL, but it never appears to be set in the test

michaeldrogalis18:08:39

Maybe some repl-funniness is going on.

surreal.analysis18:08:52

No, it’s being set to nil in the test

surreal.analysis18:08:06

Presumably to deal with multiple runs in a row (e.g. if a user set state to a list accidentally you’d want to get rid of that at the end)

surreal.analysis18:08:11

Cool, that makes sense, I was just worried I was missing something fundamental about Onyx and its interactions with atoms

michaeldrogalis18:08:47

Ah, nope. Atoms are regular Clojure atoms, nothing magical there.

michaeldrogalis18:08:06

Heh, we should probably make that clearer given how Om and Reagent work. 😛

surreal.analysis18:08:09

I suppose the one tweak I’d consider making is setting state to Integer/MIN_VALUE as that simplifies the code, but this is just something I should have caught

michaeldrogalis18:08:13

If you want to send that in as a PR, that'd be good.

Travis18:08:43

Does anyone here have any XP with Elasticsearch, I am concerned that doing streaming with a bunch of different jobs writing directly to ES is not a good idea but wanted some thoughts?

surreal.analysis18:08:21

@michaeldrogalis Would it be two PRs? One for master, one for answers?

michaeldrogalis18:08:55

Send just one into master, and Ill merge that into answers. If I ever accidentally merge answers into master, guh, that will not be a good day, haha.

michaeldrogalis18:08:04

Its very.. Delicate.

manderson18:08:55

@camechis can you elaborate a bit more on your use case?

Travis18:08:58

sure, We have a relatively high volume worth of data coming into our onyx jobs and I am concerned that the speed of the writes from onyx to elastic may be to much for elastic

manderson18:08:41

ok, so your concern is more around performance than correctness/concurrency?

Travis18:08:06

yeah, I am worried about elastic falling over with the sheer number of writes that will begin to happen

Travis18:08:21

just trying to see if it is a justified concern

manderson18:08:28

yeah, I think it's a valid concern, but hard to know where the "too much" line is without doing some benchmarking. have you considered using the elasticsearch bulk api? And perhaps windowing with onyx or something similar to aggregate your ES writes into batches?

Travis18:08:00

we are doing it but it still going to be a ton of writes with the sheer number of jobs + speed of data

surreal.analysis18:08:59

After thinking about it, I think nil makes more sense. The answer flows pretty well, and I think that error messages where you get an NPE or see nil might be better than error messages that spit out a randomly low number. Thanks for talking through things.

manderson18:08:41

Unfortunately, my real world experience with ES has been with "normal" write loads, so not sure I can offer help from my own experience. But a quick google turns up several blogs on performance tuning elasticsearch. If I were you, I'd start there and try to run some benchmarks to see a real use case in action.

Travis18:08:13

yeah, I agree

michaeldrogalis18:08:36

@surreal.analysis Anytime. Thanks for the proposal 🙂