onyx 2017-02-06 | Slack Archive

Should I always set the same onyx/batch-timeout & onyx/batch-size for all tasks?

@isaac Not unless all tasks have roughly the same latency. You can use those two parameters to control how long each task will wait to fill up its input buffer, and how capacious that buffer is.

michaeldrogalis16:02:14

For tasks that take longer can be amortized by taking many segments as input in parallel, you’d want a higher timeout and batch size. For shorter tasks, the opposite.

michaeldrogalis16:02:49

That said, in some streaming applications where most operations are simple functions and you don’t need to get to the absolute edge of high performance, you can pick a value that’s good enough across the board.

mccraigmccraig19:02:08

i've just seen some behaviour where segments were re-processed. these segments were probably quite slow to process - several seconds i think. is there a timeout or policy i can tweak to stop them getting re-processed ?

mccraigmccraig19:02:02

(i need to refactor the processing some so that the work is split up into many segments - but i also need to keep it running while i do that 🙂 )

michaeldrogalis19:02:14

@mccraigmccraig You want to increase :onyx/pending-timeout if you’re on 0.9.

michaeldrogalis19:02:25

http://www.onyxplatform.org/docs/cheat-sheet/latest/#catalog-entry/:onyx/pending-timeout

mccraigmccraig19:02:30

yep, currently on 0.9.11

mccraigmccraig19:02:58

ah, so it's about to change...

michaeldrogalis19:02:09

Cool, yep. Should be able to raise that. Are you running any metrics or monitoring to verify what you saw? We emit metrics when backpressure is in effect.

michaeldrogalis19:02:27

Yes - its just a knob that no longer exists. For the most part you just don’t have to tune that piece anymore

mccraigmccraig19:02:16

just logging atm - i should add onyx-metrics

michaeldrogalis19:02:33

Indeed :thumbsup:

mccraigmccraig19:02:14

ha, i haven't touched the onyx stuff for a while, because it generally just works - but i've just added a bunch of new features, and that's highlighting some bad assumptions i made

michaeldrogalis19:02:06

@mccraigmccraig Glad it’s been a smooth experience. 🙂

jmorris0x020:02:54

@michaeldrogalis Hi. I asked a Github question over the weekend regarding multi-language support for onyx functions. In your answer you mentioned the importance of performant inter-language communication. I’ve been looking at how people are doing this. Google’s protocol buffers and Apache Thrift seem high on the list but I’m sure you have many thoughts on this.

michaeldrogalis20:02:18

@jmorris0x0 Hey! Thanks for dropping by. One-time serialization isn’t much of a problem. We can actually do this a lot more efficiently than most by using Aeron for inter-process communication. The hurdle is going back and forth between runtimes at each stage of the lifecycle.

michaeldrogalis20:02:46

e.g. start in Clojure at “before read batch”, shell out to language X, shuttle that back to Clojure, move to the next lifecycle step, and so forth

michaeldrogalis20:02:22

@lucasbradstreet Given the overhaul of how the task lifecycle turned into a state machine in 0.10.x, is your opinion that pulling this off is any easier or harder now?

michaeldrogalis20:02:39

@jmorris0x0 Which language are you targeting, by the way?

jmorris0x020:02:43

I’m interested in python, as I have some code already written that relies on python’s machine learning libs.

jmorris0x020:02:15

Would be great for interop with Pandas (python dataframes) as well.

michaeldrogalis20:02:13

Is it a requirement that you have access to lifecycles in Python? The work to be done is a whole lot more tractable if you merely want to call Python code from :onyx/fn.

jmorris0x020:02:39

Hmm. That’s a good question. Without thinking too deeply about it, not having access to lifecycles is something that I could work around. So no, I don’t think so.

lucasbradstreet20:02:15

The new state machine may help, but the main hurdle is really the serialization (as @jmorris0x0 says) and how we dispatch from there (whether IPC, or JNI). On the serialization front, protocol buffers are fine, though I’ve been getting tempted to write our own serialization format with SBE (https://github.com/real-logic/simple-binary-encoding) which for our own internal serialization

lucasbradstreet20:02:54

I think SBE could end up being faster for our use cases, but protobuf is a lot more widely used, so maybe it’s worthwhile going with it for language interop alone. That said, I really want to avoid creating a lot of intermediate objects during normal operation, so I’d like to see if it’s appropriate.

lucasbradstreet20:02:53

Ideally, onyx will just hand over the bytes for the segments to the Onyx language bindings, which will deserialize them, apply the fn, and then hand back the bytes

lucasbradstreet20:02:23

This gets a lot hard with windowing and what not though 😕

michaeldrogalis20:02:16

@jmorris0x0 In that case, have you ever tried Python’s JavaBridge?

jmorris0x021:02:13

Nope. Digging into it now.

mccraigmccraig21:02:49

@michaeldrogalis following on from earlier question about :onyx/pending-timeout - what happens to any thread associated with processing a segment which doesn't return within the timeout ? does it get interrupted ?

michaeldrogalis21:02:04

@mccraigmccraig It doesn’t, no. Its on the application programmer to make sure that user-level functions eventually return

lucasbradstreet21:02:31

forcing threads to be cancelled is harder than one might think.

mccraigmccraig21:02:54

@lucasbradstreet it's easy when all your ops are async behind the scenes 🙂

mccraigmccraig21:02:41

though i guess it's not really forcing the thread to cancel then

michaeldrogalis21:02:39

Tough part about that would be knowing which thread is harboring the hostage 😛

mccraigmccraig21:02:46

what happens if a function doesn't return - does the peer which is running it become blocked then, or can it continue somehow - start a new thread or whatever ?

lucasbradstreet21:02:12

In 0.9 things can go bad. In 0.10.x, the peer will stop heartbeating and will be booted

mccraigmccraig21:02:05

ok, cool - i think i can see the way forward now. thanks @michaeldrogalis @lucasbradstreet

mccraigmccraig21:02:44

dyu have an estimate on when 0.10 is likely to release ?

lucasbradstreet21:02:03

Getting ready for a beta release now.

jeremy22:02:34

hey, so i've been trying to get started with onyx, using the datomic plugin to read the log, and then output all segments to the a core.async channel so that i can prove to myself something is happening. however, i've had no luck. thing seem fine in the onyx.log. so i'm stumped. i've made my prototype as small as possible: https://gist.github.com/jeremyheiler/e049f1fc58e19103e91d3967d29b9847 what can i do debug this?

michaeldrogalis23:02:38

@jeremy One thing you can do it ensure that the input task is receiving any data in the first place. Usually you’d do this with metrics and monitoring, but since you’re just getting your feet wet, you can add: :onyx/fn ::spy to the catalog entry for your input task. Then make a function:

(defn spy [segment]
  (prn "Read: " segment)
  segment)

michaeldrogalis23:02:57

Using an onyx/fn on an input task let’s you do a transformation of the data before it flows through the rest of the system. If you’re toying around locally, this is adequate for debugging.

jeremy23:02:15

so, i was literally just about to say i found that in the docs, and yes, it is reading data. so i guess the issue is with core-async

michaeldrogalis23:02:50

Cool. Perhaps turn your channel into a dropping or sliding buffer. If you’re not removing contents from the channel as they arrive and you hit capacity, progress will lock up

jeremy23:02:26

thanks

jeremy23:02:36

making the channel a sliding buffer didn't change anything

jeremy23:02:07

so, if i just pull from the out channel, rather than calling take-segments! after submitting the job, it blocks forever

michaeldrogalis23:02:33

Hm, say - you’re not shutting down the environment quickly, are you?

michaeldrogalis23:02:40

https://gist.github.com/jeremyheiler/e049f1fc58e19103e91d3967d29b9847#file-onyx-clj-L70

michaeldrogalis23:02:02

You’re probably not - but just a quick guess that if you’re not blocking here, you’d drop straight into the finally block and shutdown the environment.

2017-02-06

Channels