onyx 2017-08-27 | Slack Archive

lmergen08:08:19

i guess this is related to an at-least-once vs at-most-once decision, but what would be the appropriate point in an onyx 0.10 plugin to consider a segment to be fully processed ? i’m not sure whether the synced? function or the checkpointed! function is the correct one to use, since both of them have access to the epoch

lucasbradstreet08:08:18

Synced means that it's safe for the barrier to proceed, checkpointed! means that the all the barriers for that epoch have been processed and the checkpoint has been written.

lmergen08:08:48

right, so checkpointed! is called before synced?

lucasbradstreet08:08:50

The main purpose of synced? is to give a synchronisation point at which the plugin can be sure things are written out to the next task or the output, especially for asynchronous actions, but if something fails before the final checkpoint has been communicated up to the coordinator and written, then you could end up with it all being performed again for that data.

lucasbradstreet08:08:53

After

lucasbradstreet08:08:43

Basically the coordinator waits for synced? = true on all tasks, then once the coordinator knows it’s synced? on all tasks, the coordinator communicates back down to tell the tasks to checkpointed! for that epoch.

lmergen08:08:03

lucasbradstreet08:08:47

Writing a plugin?

lmergen08:08:53

how did you guess that 🙂

lmergen08:08:53

okay so then synced? would be the best place to mark things as fully processed for an input plugin

lmergen08:08:51

how do i know what the ‘next’ epoch will be after a synced ? is it always just incremented by 1 ?

lucasbradstreet08:08:19

Yes, but make sure you reset any state on a recover! call, since it means that it never moved to the next epoch.

lucasbradstreet08:08:32

And yes, next epoch is always incremented by 1, unless recover! is called, at which point it resets back to 0

lmergen08:08:44

lucasbradstreet08:08:47

(and you get a new replica version passed in)

lucasbradstreet08:08:06

If you would like a code review at any point let me know.

lucasbradstreet08:08:12

Good to see how the plugin interface is being used.

lmergen08:08:38

thanks for that, i might actually make use of the offer 🙂

lmergen08:08:59

if you’re on board, i might do a PR of plugins/protocols.clj to document these semantics a bit better

lucasbradstreet08:08:06

The core async input plugin is kind of an ungodly hack, so try not to follow it too much.

lucasbradstreet08:08:08

Yes please!

lmergen08:08:27

ok will do

lucasbradstreet08:08:42

Thank you 🙂

lmergen08:08:40

one last question — what’s the appropriate way to tell poll! to back off for a while because there is no input now, but might be in a few seconds ? the ugly way would be to just call Thread/sleep for up to timeout-ms, but is there a better way (e.g. mark drained? )

lucasbradstreet09:08:41

Well, we pass in timeout-ms to you, I would just track timeout-ms and backoff with [java.util.concurrent.locks LockSupport] via LockSupport/parkNanos, and then our own poll! again

lucasbradstreet09:08:46

with a smaller timeout-ms

lucasbradstreet09:08:19

If it weren’t for this (when-not (nil? segment) https://github.com/onyx-platform/onyx/blob/0.10.x/src/onyx/peer/read_batch.clj#L40 it would track the poll time for you and then bail after batch-timeout.

lucasbradstreet09:08:58

Maybe it should work that way, I generally prefer to it to just get on with sending work down in that case though.

lmergen09:08:20

oh, that makes sense

lucasbradstreet09:08:36

The current behaviour is probably best for throughput, but it does come at the cost of some cpu burn.

lmergen09:08:01

yep

lmergen09:08:09

okay, thanks for the advice, will implement it as such

lucasbradstreet09:08:34

No worries

2017-08-27

Channels