This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2015-10-13
Channels
- # admin-announcements (48)
- # alda (1)
- # aws (24)
- # beginners (4)
- # boot (103)
- # cider (69)
- # clojure (111)
- # clojure-art (5)
- # clojure-dev (35)
- # clojure-greece (2)
- # clojure-nl (3)
- # clojure-russia (1)
- # clojure-shanghai (1)
- # clojurescript (220)
- # clojurescript-ios (1)
- # clojurewerkz (3)
- # community-development (3)
- # core-logic (5)
- # cursive (5)
- # datomic (24)
- # devcards (21)
- # editors (3)
- # funcool (1)
- # hoplon (20)
- # ldnclj (47)
- # ldnproclodo (1)
- # liberator (1)
- # off-topic (7)
- # om (21)
- # onyx (36)
- # reagent (1)
- # ring-swagger (2)
- # spacemacs (38)
- # yada (17)
I'm trying to write an onyx-rethink plugin but after submitting a job, I get exceptions like these every 500ms "org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /onyx/b6355599-cb33-4e44-a548-a0ce73b3d74b/job-scheduler/scheduler code: -101 path: "/onyx/b6355599-cb33-4e44-a548-a0ce73b3d74b/job-scheduler/scheduler""
Actually, turns out submitting the job is fine. This exception is thrown after calling await-job-completion
Are you using the in memory ZK server (started up in the env), and are you stopping it again before await-job-completion completes?
Thanks for responding. I got past this problem. There were a bunch of things I was doing wrong (e.g. not including the lifecycle calls to inject the input data, and mis-referencing a function or two. Think I'm nearly there with the hello world of feeding data into rethink
fwiw, we stopped using in-memory ZK and just brew install zookeeper
instead. much simpler, because now dev = staging = prod in terms of config
Yeah, the in memory ZK is handy but it's not always the best choice
@cddr: Cool. A Rethink plugin would be exciting!
@cddr let us know if you need any tips with the input plugin. The best starting point is currently the onyx-plugin template.
in esper if I were to write "SELECT surname, count(*) FROM parties.win:time(5 min) GROUP BY surname OUTPUT last every 1 seconds" I would end up with a data set like [[a 30] [b 20] [c 50]] (if there were 3 surnames, I will ignore the issue of having an unknown number of surnames for now). Using the upcoming sliding windowing facilities of onyx would it be correct to model it such that there is a flow condition which directs input (the party event which is like {:surname :x :party-time <inst>}) via surname to distinct windows which aggregate by count and then having the ouput of the windows all go to an output?
to be clear the onyx sliding window stuff maps nicely to the esper notion of sliding window/output last stuff so there is no issue there
For my own clarity, Esper is providing a Continuous Query Language there, correct?
yes that is correct
(their variant is called EPL but yeah same idea)
I think you'd want to use :onyx/group-by-key
on the task map.
That way count will maintain its aggregate as a map, rather than a scalar. This isnt available yet, but it will be in.... 8 hours. Thats my issue for tonight
Flow conditions route data from one task to its downstream tasks. They're independent of windows
interesting, so would this then be just one sliding window output directed to an onyx/fn doing aggregation with :onyx/group-by-key :surname in this case?
If I follow your example, yes.
And a trigger to send the output along
nice and that speaks to my followup question then of "what if there are surnames not known until runtime" as it no longer matters
I might be wrong - that example contains a lot of features, some speculative - but it's nice that they can all be composed together.
Rephrasing, if it didn't work like that, we'd have a bug or a design change. Heh.
that is good to hear, I can keep the obscure esper/cep examples coming if it helps flesh things out
Yeah, absolutely. We're aiming to close this release as stable by the Conj, so that would be very helpful.
the load I am looking to test against esper performance is 200k tuples per second in 5 minute sliding window just doing a group by and count as above.
Across how many machines?
1 at first to comp to current esper approach, but that is part of reason for "why onyx" as it should facilitate horizontal scalability to increase performance/capacity
We'll be slower than 200k/s at first. I think we clocked it at 5k/core last week. There's substantial perf work to be done though
is the benchmark code in repo?
Yeah but its not documented for anyone other than Lucas and myself. Feel free to browse it though. https://github.com/onyx-platform/onyx-benchmark
when you say 5k/core what size are the messages you are dealing with? (sorry this may be clear in the benchmark and I have not fully digested that yet)
100 bytes/msg