Fork me on GitHub
#onyx
<
2016-07-05
>
nha13:07:11

Hello, any reason the https://github.com/onyx-platform/onyx-kafka-0.8 is on 0.8 and in "maintenance mode" ?

lucasbradstreet14:07:33

Hi @nha. Bceause it’s the 0.8 plugin, using a different dependency which supports 0.8. Therefore it’s in maintenance mode, while the mainstream plugin supports 0.9 (and possibly 0.8)

lucasbradstreet14:07:46

We don’t want to leave anyone behind at the moment

michaeldrogalis14:07:14

For what it's worth, I tested the 0.9 bindings with 0.8 - they're incompatible.

michaeldrogalis14:07:24

Also, Kafka 0.10.0 is already out.

acron15:07:25

I'm looking at :onyx/batch-timeout - is there a way to set this as infinite?

lucasbradstreet15:07:27

@acron I don't know why you'd want to.

lucasbradstreet15:07:51

If anything, shorter is better

lucasbradstreet15:07:58

If you're trying to prevent an empty batch being processed by a lifecycle, it's probably just better to check whether the batch is empty and keeping the batch timeout long

acron15:07:56

@lucasbradstreet: Ok, that's basically it...

acron15:07:06

@lucasbradstreet: This is a symptom of the way we're using Onyx in this project. It may offend your sensibilities but we're basically only ever firing one segment

michaeldrogalis15:07:09

@acron: In the entire job?

acron15:07:52

Our app basically hinges on letting users build jobs

acron15:07:55

"Workspaces" to them

acron15:07:16

we create an onyx job under the bonnet and chuck a single data structure along it

michaeldrogalis15:07:22

@acron: Unless the job completes very quickly, you should probably look at redesigning that somehow. You're going the most coarse fault tolerance possible. If any step in that single segment's processing fails, it needs to go back to the root task that it came from.

michaeldrogalis15:07:53

For segments that process quickly, it's completely acceptable. But for one segment per job you're potentionally paying a heavy price.

acron15:07:29

@michaeldrogalis: yeah, we realise we're in non-standard territory but there are still elements of a job that are asynchronous and the way we've designed peers is that they can participate in multiple jobs

acron15:07:44

it still feels like the best solution in terms of having a solution that scales

michaeldrogalis15:07:01

e.g. a peer working multiple jobs.

acron15:07:05

Well, the peers have a bucket of fns - each job can be any arrangement of those fns - so one job might be A->B->C, another job might be X->Y-Z

acron15:07:22

The same peer could participate in either job/both jobs

michaeldrogalis15:07:05

Hard to say since I'm not looking at the code, but I think Onyx can already do what you're thinking of without any extra code. Every virtual peer can participate in any task, unless you used tags to specify otherwise. The only iron-clad guarantee right now is that every virtual peer will work on at most one task at a time.

michaeldrogalis15:07:20

Which is why typically all functions get deployed to all peers. Onyx will selectively use them.

acron15:07:55

Oh yeah, we haven't had to write any extra code to get this working 🙂

acron16:07:08

Onyx has risen to the challenge brilliantly

acron16:07:43

I'm just ironing out kinks and one thing we noticed was empty batches causing trouble

michaeldrogalis16:07:13

Ah, okie dokie. Yeah as long as its working for you, seems fine.

acron16:07:35

And as we're in the unique circumstance where we know there's only one segment,...hence my question about the timeout

michaeldrogalis16:07:47

Right. Yeah, we don't support indefinitely blocking. It's too far outside of what it was designed for. You can jack up the timeout super high, but that's just a bandaid. What's the harm in processing an empty batch?

acron16:07:18

We've written some plugins to introduce state into the job - this allows us to merge tasks in a job and also introduce loops... we need to add empty batch handling into those plugins, that's all

acron16:07:36

I'm just trying to avoid work

acron16:07:58

I know that could sound horrifying

michaeldrogalis16:07:09

Little bit, but you gotta do what you gotta do 😛

michaeldrogalis16:07:40

Empty batch checking is the way to go, though. I cant think of another way to handle it without introducing new primitives into the streaming engine.

acron16:07:14

Sure, I think it's the safest way to go

michaeldrogalis16:07:16

Onyx 0.9.7 is officially out. The plugins and documentation are still building, but you can get core from Clojars right now. The rest of the build should finish in the next 2 hours.

michaeldrogalis16:07:23

Blog post tomorrow.

nha16:07:19

Ah I just missed the other kafka plugin 🙂 Thanks !

gardnervickers20:07:23

With Onyx that would be a quick way to get multiple sequential aggregations while preserving fault tolerance.

michaeldrogalis20:07:12

Why are the topologies split?

Drew Verlee21:07:14

@michaeldrogalis: it was from this article: http://hortonworks.com/blog/storm-kafka-together-real-time-data-refinery. The summery was:

* Incrementally add more topologies/use cases
    * Tap into raw or refined data streams at any stage of the processing
    * Modularize your key cluster resources to most intense processing phase of the pipeline
I understand the first two reasons, but not > Modularize your key cluster resources to most intense processing phase of the pipeline i suppose i don’t understand what extra modularity is achieved. Ill have to research a bit and see if anything clicks. I recall seeing a talk by another company that did something very similar. > With Onyx that would be a quick way to get multiple sequential aggregations while preserving fault tolerance. @gardnervickers: How does interviewing kafka introduce more fault tolerence? Thanks!!!! simple_smile

michaeldrogalis21:07:04

> i suppose i don’t understand what extra modularity is achieved. Ill have to research a bit and see if anything clicks. They mean you can dedicate hardware to specific topologies, so you'll get better perf isolation