Fork me on GitHub
#onyx
<
2016-04-21
>
lucasbradstreet03:04:21

This will be useful "Java 8u92 includes new VM switches: ExitOnOutOfMemory and CrashOnOutOfMemory http://www.oracle.com/technetwork/java/javase/8u92-relnotes-2949471.html"

lucasbradstreet03:04:24

We should setup an unhandled exception handler for at least our tests https://stuartsierra.com/2015/05/27/clojure-uncaught-exceptions

acron08:04:03

is the lein template for creating plugins (https://github.com/onyx-platform/onyx-plugin) still safe to use? I notice it's a couple of versions behind

rasom14:04:18

Hi, i’m trying to write test where i want to have guarantee that the previous segment was processed before the next arrives to the job. How can I achieve this?

gardnervickers14:04:17

Your going to need to use the windowing features of you want to get order/exactly once semantics.

rasom14:04:41

The idea is that in real world these events are not ordered, and i don’t care if they come simultaneously, i just want to test situation when they are distributed in time

rasom14:04:49

Do I still need windowing for this?

gardnervickers14:04:59

Hmm I’m a bit confused by what you mean by “test situation when they are distributed in time"

gardnervickers14:04:29

If you could elaborate on what you’re trying to do I think I could offer better insight

rasom14:04:13

ok. Imagine that some data arrives to job and after its processing it is stored in the database. Now if another segment arrives, it is processed, but data that was stored in the database perviously can affect results of this processing. If both segments are processed simultaneously it might happen that they will not affect each other. So I want to test the situation when the second segment is processed after the first, the case when data from database will affect processing of the second segment.

rasom14:04:35

Is that clearer? simple_smile

gardnervickers14:04:31

Yes thanks, if you need ordering guarantees then you will need to use Onyx’s windowing features.

michaeldrogalis14:04:17

@acron: I need to upgrade it and get it under our auto-release process. It's mostly okay, but make sure you're using the latest version.

michaeldrogalis14:04:29

Not much has changed with respect to the plugin interface in the last couple of releases.

michaeldrogalis14:04:20

@rasom: In general, Onyx doesn't provide any guarantees about the ordering of your data when it's in flight - otherwise performance would be crippled. You can use windowing, as @gardnervickers said, to maintain some state over time and get a view into that state - but it sounds like you'd be better off with unit or integration testing for what you're trying to do.

gardnervickers14:04:51

@rasom, I should have elaborated, using windows will not provide ordering by itself, but they will allow you to order segments before flushing state to a database.

rasom14:04:55

@gardnervickers: ok, i see. Thanks

rasom14:04:13

If I set :onyx/batch-size to 1, does it mean that segment will be processed exactly when it comes to task, without waiting for other segments?

acron15:04:23

@michaeldrogalis: I'm playing with onyx-redis and noticing that write-batch is called repeatedly. Is this normal for 'output' plugins?

acron15:04:05

It seems to receive nil a lot

acron15:04:35

or, an empty tree rather

michaeldrogalis15:04:44

@acron: The plugin is receiving no input, and the batch timeout is running out - so the lifecycle executes with an empty batch.

acron15:04:13

@michaeldrogalis: ok, that makes sense. So it's okay to just do nothing at that point?

acron16:04:54

Hmmm, I think I could use seeing some more documentation about plugins

acron16:04:14

For example, when does ack-segment get called? I'm never seeing it get called

lucasbradstreet16:04:47

ack-segment will get called when all of the segments generated from the initial root segment have been processed

acron16:04:45

root segment? root node?

acron16:04:35

And 'processed' by the plugin? Or the whole job?

lucasbradstreet16:04:10

Root segment being the segment read from the input plugin. By segments generated I mean any segment in the tree of segments resulting from that original segment

lucasbradstreet16:04:17

In the overall job

lucasbradstreet16:04:01

So if a segment is read at input task T1, and it is sent to T2, which created two segments from it, then those are sent to an output task, where it is written out. An ack is sent for each segment at each stage of the process, and when all of those acks are received the plugin will call ack-segment on the input task

acron16:04:24

Sorry for all the qns simple_smile @tcoupland and I are actually trying to shoe-horn some loops into a workflow using external storage so...this is really testing our understanding of Onyx and the plugins

acron16:04:00

Ok, that makes sense. So, regards 'pending messages' (a pattern in a lot of the plugins), is the logic that at any point the plugin could be asked to 'retry' a segment and should retrieve it from the pending list?

acron16:04:36

The implication being that a segment was lost or failed further downstream?