Fork me on GitHub
#onyx
<
2016-11-18
>
yonatanel11:11:38

Any news on the ignored :aggregation/init function? Or the extra event state parameters?

yonatanel11:11:55

BTW, is there anything in Onyx that could be separated to a "distribution" lib that anyone can use to implement new abstractions, for example if I want to support externally-initialized aggregates, or implement actors model etc.?

lucasbradstreet11:11:44

I'm not sure how much more we could split out. It's a little tricky. The new ABS work may allow extra lifecycle hooks because we've turned the whole task into a state machine

yonatanel11:11:58

Thanks Lucas. What's a good place to learn about how things will work in the new ABS release?

lucasbradstreet11:11:59

We don't have much in the way of documentation onyx wise yet

asolovyov12:11:43

I'm trying to learn to read onyx-metrics and this looks a bit weird to me: https://monosnap.com/file/bhnkuKDHBLCH77AbF8KPkC8QrmSHvU I'm not sure why complete latency gets posted only sometimes, and why reading from kafka always has bigger throughput than sending emails, even though we don't experience any lags?

lucasbradstreet12:11:28

complete-latency will be posted whenever a message is acked. Are you seeing retries?

lucasbradstreet12:11:25

That could also explain why your send-email is lower than your read-email throughput. If messages are being lost, or similar, then you will see retries and a lower throughput at send-email

lucasbradstreet12:11:46

It also might be explained by having two peers on the send-email task, since you’re taking an mean of the throughput, assuming you’re not grouping by peer

lucasbradstreet12:11:12

i.e. you have two 2500 throughput readings per second which are being averaged to 2500 throughput overall

lucasbradstreet12:11:16

rather than being summed

asolovyov12:11:29

sorry for not answering right away, your first messages got me to check what's going on and we indeed had retries 🙂

asolovyov12:11:42

fixed that now and will see if chart changes

asolovyov12:11:09

oh, three peers on send-email and a single one on read-email

asolovyov12:11:58

I guess I'll have to change from mean to sum, but I'm not sure how to handle situation when grafana adds grouping by time... anyway, I'll see, thanks! 🙂

lucasbradstreet12:11:38

There should only be one throughout event per second, so it's safe to sum them if you care about only the aggregates

lucasbradstreet12:11:51

It's good to be able to split them so you know what each peer is doing though

asolovyov12:11:44

what do you think is a sensible setting for max-pending? 🙂

lucasbradstreet13:11:22

@asolovyov how long is a piece of string? 🙂. It really depends on what your tasks are doing. I’ve used a max pending of 100000 for simple tasks that are processable in microseconds, whereas if you’re making synchronous calls that take 100ms maybe 100 is more appropriate

lucasbradstreet13:11:35

Best I can tell you is to increase it until you get retries and then back off a bit for a safety factor

asolovyov13:11:15

I need to invent some way to experiment with tasks in production 🙂 it's really hard to emulate in development, and doing release on every small change is painful 🙂

lucasbradstreet13:11:05

test environment? The turn around times can still be long though.

asolovyov14:11:24

got an error like that and not sure how to find the actual problem, any ideas?

#error {
 :cause "No matching field found: getCause for class clojure.lang.PersistentArrayMap"
 :data {:original-exception :java.lang.IllegalArgumentException}
 :via
 [{:type clojure.lang.ExceptionInfo
   :message "No matching field found: getCause for class clojure.lang.PersistentArrayMap"
   :data {:original-exception :java.lang.IllegalArgumentException}
   :at [onyx.compression.nippy$fn__13700$fn__13701 invoke "nippy.clj" 33]}]
 :trace
 [[onyx.compression.nippy$fn__13700$fn__13701 invoke "nippy.clj" 33]

asolovyov14:11:15

ah, found better one in the logs

asolovyov15:11:44

is it possible to catch exception in flow-condition and stop doing any processing on that segment? Onyx expects that post-transform will return something sensible, but I wonder if that is possible to avoid?

lucasbradstreet15:11:44

@asolovyov looks like our exception serialization is failing you

asolovyov15:11:59

because I've been returning stuff I shouldn't 🙂

asolovyov15:11:07

so I try to post something to Sentry

asolovyov15:11:15

but then I just want to skip this segment

lucasbradstreet15:11:11

flow conditions would normally be able to do it, but it might not be because the exception handling is failing internally

lucasbradstreet15:11:25

You’re throwing an array map or something?

asolovyov15:11:56

just though that it's probably my sentry handler throwing something

lucasbradstreet15:11:58

Shouldn’t do that regardless

lucasbradstreet15:11:39

Which version of onyx are you using?

michaeldrogalis16:11:44

@yonatanel We’re open to taking most PR’s into lib-onyx.

michaeldrogalis16:11:53

It’s the general experimental library.

michaeldrogalis16:11:42

I haven’t had time to look at :aggregation/init being ignored in local-rt. If the fix seems obvious to you, can you PR and send in a fix? It should roughly match what’s in core, and IIRC that’s where you found the discrepency?

hugesandwich16:11:13

It's been awhile since I checked, but if I call kill-job, does the clean shutdown imply that the job will stop taking new segments and if a segment has not made its way to the out tasks, the job will continue to run until it drains the existing segments that need to make their way through the workflow?

michaeldrogalis16:11:47

It will not drain segments, no. Shutdown is immediate.

hugesandwich16:11:14

OK, that's what I thought and generally want, however is there a way to drain segments?

hugesandwich16:11:40

I guess I can send a :done

hugesandwich16:11:32

OK, good, that's how I am doing it now. I hadn't touched onyx since a few versions back so just verifying I am remembering things right.

hugesandwich16:11:10

If I end up with a few thousand jobs in my cluster that are mostly parked, is it better to shut some down explicitly after a long period of inactivity or is the overhead of leaving it parked not really going to have any implications?

hugesandwich16:11:26

That is to say, those jobs can and will be run again so there's a tiny justification for just keeping them there, but I could just as easily check if a corresponding job type is running, and if not, launch a new one

michaeldrogalis16:11:43

The peers will remain blocked on real JVM threads.

hugesandwich16:11:29

OK, I thought the other day someone said the peers will be reclaimed

hugesandwich16:11:16

So in that case, it would make sense then to have some inactivity threshold and end the jobs either by killing them or sending a completion.

hugesandwich16:11:34

OK, I've read through this and the source and I think I'll end up writing my own scheduler to tweak things to my needs.

hugesandwich16:11:21

Oh wow, that murdered my formatting, but whatever, you get the point. It's rendering the raw template markup there

michaeldrogalis22:11:28

@yonatanel Patch for local-rt is out on 0.9.11.2. Thanks!