Fork me on GitHub
#onyx
<
2017-04-30
>
lmergen05:04:49

when i look at the documentation of trigger/emit, it says that the segment(s) it returns are emitted downstream. what exactly is “downstream” in the context of a trigger ?

lucasbradstreet05:04:24

Downstream is any task that is connected to the trigger task that is downstream

lucasbradstreet05:04:49

So if you have A -> B -> C -> D, C will receive the segments from the emit

lucasbradstreet05:04:57

Subject to flow conditions

lmergen05:04:24

right, that makes sense

lmergen05:04:04

i already kind of assumed that, but it wasn’t obvious from the documentation

lmergen05:04:28

will C also still receive the ‘regular’ segments from B, as is the case when you use sync ?

stephenmhopper22:04:59

@michaeldrogalis hey hi hey. There was a time where you had mentioned that with some of the upcoming changes in Onyx (possibly in 0.10) it’d be possible to implement certain machine learning algorithms in Onyx. Do you remember what that was exactly? I was hoping to start doing some research around that.

stephenmhopper22:04:19

It might have been related to ABS, but I don’t recall exactly

michaeldrogalis22:04:49

@stephenmhopper Yeah, you’re remembering correctly. ABS is complete. We’re just waiting to get a little breathing room with everything else that’s going on to release 0.10.0 final. We still have validation in place to disallow cyclic workflows, but that can safely be removed now. The only hitch is that we need to do a tiny bit more work to offer exactly-once aggregations with iteration.

michaeldrogalis22:04:20

It’s not a priority for us right now, but we could outline the work remaining if anyone wanted to take a shot at it. It’s well within reach. That would allow for ML programs to run on Onyx.

stephenmhopper22:04:30

Yeah, I’d be interested in hearing more about that. Right now, I’m working on finding a convenient way to use Tensorflow in an idiomatic way from Clojure. The problem is that the DSL / contract / design that I’m building out is very similar to the way Onyx handles data. My design doesn’t map easily to underlying Tensorflow constructs, but would map well into Onyx (potentially)

michaeldrogalis22:04:35

Cool, yeah we’re close enough that it’s worth looking at. @lucasbradstreet IIRC, the last step is ensuring consistent checkpoints under iteration, right?

michaeldrogalis22:04:51

@stephenmhopper Out of curiosity, which algorithms are you implementing?

stephenmhopper22:04:33

Well, my goal is to build a useful set of neural net abstractions, similar to the way people use Keras on top of Tensorflow or Theano. Right now, I’m targeting Tensorflow as the underlying runtime, but I’d be interested in exploring what this would look like on top of Onyx

lucasbradstreet22:04:41

@michaeldrogalis that’s right, we mostly need to make sure the checkpointing handles iteration

stephenmhopper22:04:05

Because I’m 95% certain that Onyx handles distributed computations better than Tensorflow

stephenmhopper22:04:41

Do you have some recommended reading for me on using ABS to write ML algorithms?

michaeldrogalis22:04:51

Cool. Yeah, we ought to get moving on this front. It’s one of those big ticket items that we wanted for a while, and now that the majority of the work is finished, we should bring iteration to the finish line.

michaeldrogalis22:04:57

ABS is the underlying algorithm for doing fault tolerant message passing, you wouldn’t be working with it directly. On the internal side of things, we need to make sure Onyx can properly recover windowing contents during an iterative cycle. On the API front, we need to decide that will look like as far as an Onyx program.

michaeldrogalis22:04:20

I imagined extending flow conditions, but we should research what Flink is doing to get a baseline.

stephenmhopper22:04:45

okay, well I have to run, so if there’s anything you think I should read up on to get a head start on things, just PM it over to me

michaeldrogalis22:04:18

Sounds good. Thanks!