Fork me on GitHub
#onyx
<
2015-10-13
>
cddr04:10:15

I'm trying to write an onyx-rethink plugin but after submitting a job, I get exceptions like these every 500ms "org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /onyx/b6355599-cb33-4e44-a548-a0ce73b3d74b/job-scheduler/scheduler code: -101 path: "/onyx/b6355599-cb33-4e44-a548-a0ce73b3d74b/job-scheduler/scheduler""

cddr05:10:31

Actually, turns out submitting the job is fine. This exception is thrown after calling await-job-completion

lucasbradstreet05:10:36

Are you using the in memory ZK server (started up in the env), and are you stopping it again before await-job-completion completes?

cddr06:10:01

Thanks for responding. I got past this problem. There were a bunch of things I was doing wrong (e.g. not including the lifecycle calls to inject the input data, and mis-referencing a function or two. Think I'm nearly there with the hello world of feeding data into rethink

cddr06:10:07

Is a workflow of one [:input :output] valid?

robert-stuttaford11:10:24

fwiw, we stopped using in-memory ZK and just brew install zookeeper instead. much simpler, because now dev = staging = prod in terms of config

lucasbradstreet11:10:43

Yeah, the in memory ZK is handy but it's not always the best choice

michaeldrogalis14:10:12

@cddr: Cool. A Rethink plugin would be exciting!

lucasbradstreet16:10:52

@cddr let us know if you need any tips with the input plugin. The best starting point is currently the onyx-plugin template.

shaunxcode19:10:22

in esper if I were to write "SELECT surname, count(*) FROM parties.win:time(5 min) GROUP BY surname OUTPUT last every 1 seconds" I would end up with a data set like [[a 30] [b 20] [c 50]] (if there were 3 surnames, I will ignore the issue of having an unknown number of surnames for now). Using the upcoming sliding windowing facilities of onyx would it be correct to model it such that there is a flow condition which directs input (the party event which is like {:surname :x :party-time <inst>}) via surname to distinct windows which aggregate by count and then having the ouput of the windows all go to an output?

shaunxcode19:10:43

to be clear the onyx sliding window stuff maps nicely to the esper notion of sliding window/output last stuff so there is no issue there

michaeldrogalis19:10:15

For my own clarity, Esper is providing a Continuous Query Language there, correct?

shaunxcode19:10:22

yes that is correct

shaunxcode19:10:34

(their variant is called EPL but yeah same idea)

michaeldrogalis19:10:08

I think you'd want to use :onyx/group-by-key on the task map.

michaeldrogalis19:10:34

That way count will maintain its aggregate as a map, rather than a scalar. This isnt available yet, but it will be in.... 8 hours. simple_smile Thats my issue for tonight

michaeldrogalis19:10:36

Flow conditions route data from one task to its downstream tasks. They're independent of windows

shaunxcode20:10:43

interesting, so would this then be just one sliding window output directed to an onyx/fn doing aggregation with :onyx/group-by-key :surname in this case?

michaeldrogalis20:10:07

If I follow your example, yes.

michaeldrogalis20:10:34

And a trigger to send the output along

shaunxcode20:10:07

nice and that speaks to my followup question then of "what if there are surnames not known until runtime" as it no longer matters

michaeldrogalis20:10:23

simple_smile I might be wrong - that example contains a lot of features, some speculative - but it's nice that they can all be composed together.

michaeldrogalis20:10:45

Rephrasing, if it didn't work like that, we'd have a bug or a design change. Heh.

shaunxcode20:10:41

that is good to hear, I can keep the obscure esper/cep examples coming if it helps flesh things out

michaeldrogalis20:10:18

Yeah, absolutely. We're aiming to close this release as stable by the Conj, so that would be very helpful.

shaunxcode20:10:10

the load I am looking to test against esper performance is 200k tuples per second in 5 minute sliding window just doing a group by and count as above.

michaeldrogalis20:10:14

Across how many machines?

shaunxcode20:10:56

1 at first to comp to current esper approach, but that is part of reason for "why onyx" as it should facilitate horizontal scalability to increase performance/capacity

michaeldrogalis20:10:01

We'll be slower than 200k/s at first. I think we clocked it at 5k/core last week. There's substantial perf work to be done though

shaunxcode21:10:14

is the benchmark code in repo?

michaeldrogalis21:10:03

Yeah but its not documented for anyone other than Lucas and myself. Feel free to browse it though. https://github.com/onyx-platform/onyx-benchmark

shaunxcode22:10:42

when you say 5k/core what size are the messages you are dealing with? (sorry this may be clear in the benchmark and I have not fully digested that yet)