This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2018-06-15
Channels
- # beginners (56)
- # boot (4)
- # cider (22)
- # clara (10)
- # cljs-dev (50)
- # cljsrn (27)
- # clojure (27)
- # clojure-conj (4)
- # clojure-dev (3)
- # clojure-italy (17)
- # clojure-nl (12)
- # clojure-norway (3)
- # clojure-spec (10)
- # clojure-uk (137)
- # clojurescript (132)
- # cursive (4)
- # datascript (2)
- # datomic (109)
- # devcards (2)
- # editors (1)
- # emacs (4)
- # euroclojure (2)
- # events (4)
- # figwheel (1)
- # fulcro (15)
- # jobs (1)
- # jobs-discuss (4)
- # juxt (3)
- # leiningen (2)
- # off-topic (21)
- # onyx (13)
- # other-languages (8)
- # pedestal (6)
- # re-frame (22)
- # reagent (5)
- # reitit (1)
- # ring-swagger (3)
- # shadow-cljs (75)
- # sql (6)
- # tools-deps (2)
- # vim (1)
- # yada (8)
@michaeldrogalis Thank you for the pointers. Yes, they run consecutively. Sounds like we should consider external stores (kafka, redis, db, etc) rather than serializing large values
@aaron51 Yup! You have plenty of choices there.
will the onyx seq plugin ensure that peers are coordinating the input segments and not processing a unique segment n-times (n being the number of peers associated with that input task)? In other words, if I'm passing a list of segments to onyx-seq 3 times during task definition on 3 peers, will the peers coordinate to see which one has seen a given segment?
It won’t, though assuming you’re just passing it a static list, one input peer would likely be enough to distribute the work downstream. It’d be pretty easy to update onyx-seq to partition the work based on the number of peers but I haven’t seen a reason to yet
Alternatively if you inject the data via a lifecycle you can partition it based on the slot id and number of peers
What if there's a lot of data to be put into the seq? Would it still work to put all of it in the lifecycle? I had the understanding that putting data in the lifecycle might have the unintended consequence of making your barriers really big
Yes, so what I meant by that is that you have a before-task-start fn that injects say (map (fn[i] {:n i}) (range 10000))
but instead you check :onyx.core/slot-id
inside that fn, and based on the slot, you would return differently partitioned ranges for each peer
that way each peer gets a different part of your data set
it also works if you know that want to read, say, a bunch of S3 objects. Based on the particular slot the peer is on, you would have each peer’s injection return a different partition of the objects that you passed in
(e.g. first peer gets the first 1/3, second gets the second 1/3, third gets the last 1/3)
the idea works whether you’re generating the messages or passing in via the lifecycle