Fork me on GitHub
#onyx
<
2018-06-15
>
aaron5103:06:05

@michaeldrogalis Thank you for the pointers. Yes, they run consecutively. Sounds like we should consider external stores (kafka, redis, db, etc) rather than serializing large values

michaeldrogalis13:06:14

@aaron51 Yup! You have plenty of choices there.

dbernal17:06:32

will the onyx seq plugin ensure that peers are coordinating the input segments and not processing a unique segment n-times (n being the number of peers associated with that input task)? In other words, if I'm passing a list of segments to onyx-seq 3 times during task definition on 3 peers, will the peers coordinate to see which one has seen a given segment?

lucasbradstreet19:06:00

It won’t, though assuming you’re just passing it a static list, one input peer would likely be enough to distribute the work downstream. It’d be pretty easy to update onyx-seq to partition the work based on the number of peers but I haven’t seen a reason to yet

lucasbradstreet19:06:45

Alternatively if you inject the data via a lifecycle you can partition it based on the slot id and number of peers

dbernal22:06:25

What if there's a lot of data to be put into the seq? Would it still work to put all of it in the lifecycle? I had the understanding that putting data in the lifecycle might have the unintended consequence of making your barriers really big

lucasbradstreet22:06:55

Yes, so what I meant by that is that you have a before-task-start fn that injects say (map (fn[i] {:n i}) (range 10000))

lucasbradstreet22:06:35

but instead you check :onyx.core/slot-id inside that fn, and based on the slot, you would return differently partitioned ranges for each peer

lucasbradstreet22:06:41

that way each peer gets a different part of your data set

lucasbradstreet22:06:33

it also works if you know that want to read, say, a bunch of S3 objects. Based on the particular slot the peer is on, you would have each peer’s injection return a different partition of the objects that you passed in

lucasbradstreet22:06:05

(e.g. first peer gets the first 1/3, second gets the second 1/3, third gets the last 1/3)

lucasbradstreet22:06:24

the idea works whether you’re generating the messages or passing in via the lifecycle

dbernal13:06:09

Gotcha, ty