Fork me on GitHub
Drew Verlee04:05:50

If you need lots of streaming functions, but not much raw processing power (small data) what options are out there? I like the onyx model, but it seems like onyx-local-rt isn’t built to be used for more than testing. Is there a middle ground where its possible to get the dataflow model but scale it up and down from dataflow models running in threads to dataflow models running across distributed nodes.

Drew Verlee04:05:02

I’m also curious what, if anything, needs to be done to move onyx-local-rt into a position where it could be used in a production system as a lightweight tool for realtime dataflow processing. I would be excited to do the work if necessary.


running onyx in single node mode should be perfectly fine, though you will start to top out with the current defaults if you run too many peers on one node.


It’s smart enough not to serialize messages that it’s sending to peers on the same node, so there’s not that much overhead, it’s just a bit aggressive when idling to reduce latency.


The main thing that I would like to see for more efficient scale up on a single node under the data flow model is for onyx peers to share worker threads.


the peer state machine is written in a non-blocking way, so when instead of idling it’d be easy enough to just switch to another task.


Hmm. It might be restricting all messages that flow to all tasks to ::error? segments


might be a weird handling case with :all and short circuit / error handling flow conditions


I am not sure you would ever want :all for those


I was under the impression that :all in this context meant "all possible tasks that follow :transact", hence it would equal [:out] in my case


Yeah, it might be due to how the short circuited / error handling conditions are special cased


You’re right that that’s probably how it should work.


Create an issue on github for me and I’ll give it a look


I might have found something! 🙂 it's not big deal though, I just used [:out], but it was confusing (going to make that issue now)


I might just block :all from being used on the short circuit exceptions


as it doesn’t make a lot of sense since the point is generally to restrict flow to only certain tasks


I’ll have to think about it a bit more though.


in my case what I wanted to achieve is to auto-handle exceptions across all tasks


the only way to do it (that I know of) is to create one flow condition for each, :all doesn't seem to work


Oh, don’t you want :flow/from :all, :flow/to :out?


with a short circuited flow condition?


That should send all error exceptions down to out while leaving everything else working as normal


that would be it, yes, trying now

joelsanchez18:05:14 kind of works but at the same time it doesn't


it works if the task that errored is the one that's connected to out (transact)


but I get nothing if it's another one that's not connected to out


Ah. Right, all tasks will need to be connected to out


If you’re trying to achieve that it should be pretty easy with code


ah but I can prevent them from going to out with a flow condition, and connect all of them to out


ok I'll try 🙂 thanks


got it working!

parrot 4

just realized the shortr typo...anyway, this works 100% as expected


I wire all tasks to out and catch exceptions with this flow condition, for all tasks

Drew Verlee20:05:11

@lucasbradstreet The problem i'm trying to solve is that often times we would like to leverage the dataflow model (windows, triggers, etc...) but our processing needs are best served in a none distributed (multiple servers) model. I feel i could craft together the time widowing aspect with coreasync or go channels (in golang) but i'm surprised i dont see this being done already.


Right. I guess I’m saying that if you run embedded aeron and ZooKeeper you’re pretty much getting that (though you still need s3 checkpointing). The main problem is that onyx peers are currently thread heavy.


Can the onyx-kafka plugin be used without direct access to ZK, such as with a managed kafka service like


It can take bootstrap servers rather than looking up via ZK. you’ll still need ZK for onyx but it doesn’t have to touch the kafka ZK servers


That worked great, thanks!