This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
- # architecture (3)
- # aws (4)
- # beginners (100)
- # boot (14)
- # cider (59)
- # cljs-dev (1)
- # cljsrn (24)
- # clojure (53)
- # clojure-dev (58)
- # clojure-italy (2)
- # clojure-spec (1)
- # clojure-uk (25)
- # clojurescript (7)
- # cryogen (1)
- # cursive (1)
- # datomic (9)
- # dirac (9)
- # duct (3)
- # off-topic (52)
- # om-next (3)
- # onyx (42)
- # portkey (28)
- # re-frame (3)
- # reagent (11)
- # rum (3)
- # shadow-cljs (12)
- # specter (7)
- # tools-deps (18)
- # vim (1)
- # yada (4)
If you need lots of streaming functions, but not much raw processing power (small data) what options are out there? I like the onyx model, but it seems like onyx-local-rt isn’t built to be used for more than testing. Is there a middle ground where its possible to get the dataflow model but scale it up and down from dataflow models running in threads to dataflow models running across distributed nodes.
I’m also curious what, if anything, needs to be done to move onyx-local-rt into a position where it could be used in a production system as a lightweight tool for realtime dataflow processing. I would be excited to do the work if necessary.
running onyx in single node mode should be perfectly fine, though you will start to top out with the current defaults if you run too many peers on one node.
It’s smart enough not to serialize messages that it’s sending to peers on the same node, so there’s not that much overhead, it’s just a bit aggressive when idling to reduce latency.
The main thing that I would like to see for more efficient scale up on a single node under the data flow model is for onyx peers to share worker threads.
the peer state machine is written in a non-blocking way, so when instead of idling it’d be easy enough to just switch to another task.
Hmm. It might be restricting all messages that flow to all tasks to ::error? segments
might be a weird handling case with
:all and short circuit / error handling flow conditions
I was under the impression that
:all in this context meant "all possible tasks that follow
:transact", hence it would equal
[:out] in my case
Yeah, it might be due to how the short circuited / error handling conditions are special cased
I might have found something! 🙂 it's not big deal though, I just used
[:out], but it was confusing (going to make that issue now)
as it doesn’t make a lot of sense since the point is generally to restrict flow to only certain tasks
in my case what I wanted to achieve is to auto-handle exceptions across all tasks
the only way to do it (that I know of) is to create one flow condition for each,
:all doesn't seem to work
That should send all error exceptions down to out while leaving everything else working as normal
ah but I can prevent them from going to
out with a flow condition, and connect all of them to out
I wire all tasks to out and catch exceptions with this flow condition, for all tasks
@lucasbradstreet The problem i'm trying to solve is that often times we would like to leverage the dataflow model (windows, triggers, etc...) but our processing needs are best served in a none distributed (multiple servers) model. I feel i could craft together the time widowing aspect with coreasync or go channels (in golang) but i'm surprised i dont see this being done already.
Right. I guess I’m saying that if you run embedded aeron and ZooKeeper you’re pretty much getting that (though you still need s3 checkpointing). The main problem is that onyx peers are currently thread heavy.
Can the onyx-kafka plugin be used without direct access to ZK, such as with a managed kafka service like https://github.com/CloudKarafka/java-kafka-example?
It can take bootstrap servers rather than looking up via ZK. you’ll still need ZK for onyx but it doesn’t have to touch the kafka ZK servers