Fork me on GitHub
#onyx
<
2018-05-05
>
Drew Verlee04:05:50

If you need lots of streaming functions, but not much raw processing power (small data) what options are out there? I like the onyx model, but it seems like onyx-local-rt isn’t built to be used for more than testing. Is there a middle ground where its possible to get the dataflow model but scale it up and down from dataflow models running in threads to dataflow models running across distributed nodes.

Drew Verlee04:05:02

I’m also curious what, if anything, needs to be done to move onyx-local-rt into a position where it could be used in a production system as a lightweight tool for realtime dataflow processing. I would be excited to do the work if necessary.

lucasbradstreet05:05:50

running onyx in single node mode should be perfectly fine, though you will start to top out with the current defaults if you run too many peers on one node.

lucasbradstreet05:05:15

It’s smart enough not to serialize messages that it’s sending to peers on the same node, so there’s not that much overhead, it’s just a bit aggressive when idling to reduce latency.

lucasbradstreet05:05:41

The main thing that I would like to see for more efficient scale up on a single node under the data flow model is for onyx peers to share worker threads.

lucasbradstreet05:05:46

the peer state machine is written in a non-blocking way, so when instead of idling it’d be easy enough to just switch to another task.

lucasbradstreet18:05:20

Hmm. It might be restricting all messages that flow to all tasks to ::error? segments

lucasbradstreet18:05:38

might be a weird handling case with :all and short circuit / error handling flow conditions

lucasbradstreet18:05:48

I am not sure you would ever want :all for those

joelsanchez18:05:30

I was under the impression that :all in this context meant "all possible tasks that follow :transact", hence it would equal [:out] in my case

lucasbradstreet18:05:00

Yeah, it might be due to how the short circuited / error handling conditions are special cased

lucasbradstreet18:05:09

You’re right that that’s probably how it should work.

lucasbradstreet18:05:19

Create an issue on github for me and I’ll give it a look

joelsanchez18:05:00

I might have found something! 🙂 it's not big deal though, I just used [:out], but it was confusing (going to make that issue now)

lucasbradstreet18:05:51

I might just block :all from being used on the short circuit exceptions

lucasbradstreet18:05:16

as it doesn’t make a lot of sense since the point is generally to restrict flow to only certain tasks

lucasbradstreet18:05:27

I’ll have to think about it a bit more though.

joelsanchez18:05:09

in my case what I wanted to achieve is to auto-handle exceptions across all tasks

joelsanchez18:05:45

the only way to do it (that I know of) is to create one flow condition for each, :all doesn't seem to work

lucasbradstreet18:05:03

Oh, don’t you want :flow/from :all, :flow/to :out?

lucasbradstreet18:05:19

with a short circuited flow condition?

lucasbradstreet18:05:30

That should send all error exceptions down to out while leaving everything else working as normal

joelsanchez18:05:42

that would be it, yes, trying now

joelsanchez18:05:14

so...it kind of works but at the same time it doesn't

joelsanchez18:05:33

it works if the task that errored is the one that's connected to out (transact)

joelsanchez18:05:49

but I get nothing if it's another one that's not connected to out

lucasbradstreet18:05:56

Ah. Right, all tasks will need to be connected to out

lucasbradstreet18:05:11

If you’re trying to achieve that it should be pretty easy with code

joelsanchez18:05:21

ah but I can prevent them from going to out with a flow condition, and connect all of them to out

joelsanchez18:05:24

ok I'll try 🙂 thanks

joelsanchez19:05:55

got it working!

parrot 4
joelsanchez19:05:40

just realized the shortr typo...anyway, this works 100% as expected

joelsanchez19:05:59

I wire all tasks to out and catch exceptions with this flow condition, for all tasks

Drew Verlee20:05:11

@lucasbradstreet The problem i'm trying to solve is that often times we would like to leverage the dataflow model (windows, triggers, etc...) but our processing needs are best served in a none distributed (multiple servers) model. I feel i could craft together the time widowing aspect with coreasync or go channels (in golang) but i'm surprised i dont see this being done already.

lucasbradstreet21:05:17

Right. I guess I’m saying that if you run embedded aeron and ZooKeeper you’re pretty much getting that (though you still need s3 checkpointing). The main problem is that onyx peers are currently thread heavy.

sparkofreason21:05:01

Can the onyx-kafka plugin be used without direct access to ZK, such as with a managed kafka service like https://github.com/CloudKarafka/java-kafka-example?

lucasbradstreet21:05:43

It can take bootstrap servers rather than looking up via ZK. you’ll still need ZK for onyx but it doesn’t have to touch the kafka ZK servers

sparkofreason22:05:18

That worked great, thanks!