This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2016-07-05
Channels
- # admin-announcements (10)
- # beginners (18)
- # boot (29)
- # capetown (2)
- # cider (46)
- # cljs-dev (1)
- # cljsrn (69)
- # clojure (126)
- # clojure-android (9)
- # clojure-gamedev (3)
- # clojure-greece (16)
- # clojure-poland (13)
- # clojure-russia (45)
- # clojure-spec (27)
- # clojure-uk (21)
- # clojurescript (99)
- # cursive (1)
- # datascript (1)
- # datomic (42)
- # functionalprogramming (10)
- # hoplon (47)
- # instaparse (12)
- # jobs (5)
- # jobs-rus (9)
- # keechma (22)
- # lein-figwheel (8)
- # leiningen (5)
- # luminus (1)
- # mount (7)
- # off-topic (1)
- # om (15)
- # onyx (47)
- # other-languages (14)
- # planck (28)
- # proton (8)
- # re-frame (30)
- # reagent (15)
- # remote-jobs (3)
- # slack-help (2)
- # untangled (9)
- # yada (6)
Hello, any reason the https://github.com/onyx-platform/onyx-kafka-0.8 is on 0.8 and in "maintenance mode" ?
Hi @nha. Bceause it’s the 0.8 plugin, using a different dependency which supports 0.8. Therefore it’s in maintenance mode, while the mainstream plugin supports 0.9 (and possibly 0.8)
We don’t want to leave anyone behind at the moment
For what it's worth, I tested the 0.9 bindings with 0.8 - they're incompatible.
Also, Kafka 0.10.0 is already out.
@acron I don't know why you'd want to.
If anything, shorter is better
If you're trying to prevent an empty batch being processed by a lifecycle, it's probably just better to check whether the batch is empty and keeping the batch timeout long
@lucasbradstreet: Ok, that's basically it...
@lucasbradstreet: This is a symptom of the way we're using Onyx in this project. It may offend your sensibilities but we're basically only ever firing one segment
@acron: In the entire job?
@michaeldrogalis: yep...
@acron: Unless the job completes very quickly, you should probably look at redesigning that somehow. You're going the most coarse fault tolerance possible. If any step in that single segment's processing fails, it needs to go back to the root task that it came from.
For segments that process quickly, it's completely acceptable. But for one segment per job you're potentionally paying a heavy price.
@michaeldrogalis: yeah, we realise we're in non-standard territory but there are still elements of a job that are asynchronous and the way we've designed peers is that they can participate in multiple jobs
In what sense?
e.g. a peer working multiple jobs.
Well, the peers have a bucket of fns - each job can be any arrangement of those fns - so one job might be A->B->C, another job might be X->Y-Z
Hard to say since I'm not looking at the code, but I think Onyx can already do what you're thinking of without any extra code. Every virtual peer can participate in any task, unless you used tags to specify otherwise. The only iron-clad guarantee right now is that every virtual peer will work on at most one task at a time.
Which is why typically all functions get deployed to all peers. Onyx will selectively use them.
Ah, okie dokie. Yeah as long as its working for you, seems fine.
And as we're in the unique circumstance where we know there's only one segment,...hence my question about the timeout
Right. Yeah, we don't support indefinitely blocking. It's too far outside of what it was designed for. You can jack up the timeout super high, but that's just a bandaid. What's the harm in processing an empty batch?
We've written some plugins to introduce state into the job - this allows us to merge tasks in a job and also introduce loops... we need to add empty batch handling into those plugins, that's all
Little bit, but you gotta do what you gotta do 😛
Empty batch checking is the way to go, though. I cant think of another way to handle it without introducing new primitives into the streaming engine.
Onyx 0.9.7 is officially out. The plugins and documentation are still building, but you can get core from Clojars right now. The rest of the build should finish in the next 2 hours.
Blog post tomorrow.
With Onyx that would be a quick way to get multiple sequential aggregations while preserving fault tolerance.
Why are the topologies split?
@michaeldrogalis: it was from this article: http://hortonworks.com/blog/storm-kafka-together-real-time-data-refinery. The summery was:
* Incrementally add more topologies/use cases
* Tap into raw or refined data streams at any stage of the processing
* Modularize your key cluster resources to most intense processing phase of the pipeline
I understand the first two reasons, but not
> Modularize your key cluster resources to most intense processing phase of the pipeline
i suppose i don’t understand what extra modularity is achieved. Ill have to research a bit and see if anything clicks.
I recall seeing a talk by another company that did something very similar.
> With Onyx that would be a quick way to get multiple sequential aggregations while preserving fault tolerance.
@gardnervickers: How does interviewing kafka introduce more fault tolerence?
Thanks!!!! 
> i suppose i don’t understand what extra modularity is achieved. Ill have to research a bit and see if anything clicks. They mean you can dedicate hardware to specific topologies, so you'll get better perf isolation