Fork me on GitHub
#onyx
<
2017-09-07
>
arnaud_bos09:09:55

Hello all, I've already used Onyx for very simple jobs some time ago, I think it was 0.6(?). Today I'm in charge of a PoC around streaming frameworks on image processing use cases (think earth observation satellites and drones). I must compare several frameworks such as Storm, Spark Streaming and Flink. Of course I know some the theory and I've extensively read the docs of each but in practice and especially for image processing it's another story. At some point if time allows I'd like to throw Onyx into the mix. Does anyone have experience running heavy tasks such as image processing in an Onyx job? And how it does compared to other frameworks. Broad question, I know... Any pointer would help really 😉

jasonbell09:09:24

What’s your definition of “heavy tasks”?

jasonbell09:09:01

Imaging processing, we struggled with large file processing. Getting the mix of heartbeat timesout, Aeron buffer config and so on was difficult. Onyx thrives on small message sizes where it’s brilliant.

jasonbell09:09:25

This is actually the main bulk of my talk at ClojureX in December.

mccraigmccraig09:09:33

were you putting the image binaries in onyx segments @jasonbell ?

jasonbell09:09:27

no, it was handling various sized zip files. Fun fun fun 🙂

arnaud_bos09:09:49

Well I guess I don't know precisely, it's very subjective but I have a use case as a starting point that tends to exhaust a 12 cores i7 + 30GB mem when processing more than a dozen of imgs but the use case is particularly relying on shared buffers. The current system was designed for a single machine and the poc is about taking a look at streaming frameworks, and if relevant see what kind of jobs we can do.

arnaud_bos09:09:02

Your talk is "Introducing Streaming Processing with Kafka and the Onyx Platform" - Dec. 16, 2016 ?

jasonbell09:09:40

that’s correct

jasonbell09:09:09

but I’m doing another one this year with my experiences of various streaming things including Onyx.

arnaud_bos09:09:45

Cool, looking forward to it, in the meantime I'll watch this one, thanks.

jasonbell09:09:06

@arnaud_bos This might be helpful too, but more as an experience. https://www.youtube.com/watch?v=2dSF_EJlPjE

lucasbradstreet17:09:52

Memory management will be hard for this case, whatever distributed system you use, but Onyx does likely make it more complex, because of the messaging aspect.

lucasbradstreet17:09:40

@jasonbell is right that designds that send large messages from peer to peer is not very suitable for Onyx

lucasbradstreet17:09:25

KStreams likely worked a lot better there because it’ll horizontally scale out workers for each partition, without any messaging between each other. We’ve been discussing having a mode that would work like that with Onyx, where there wouldn’t be any messaging, and all of the tasks would collapse into a single peer.