Fork me on GitHub
#onyx
<
2016-07-11
>
acron10:07:10

@lucasbradstreet: I notice you've no license on https://github.com/lbradstreet/onyx-visualization; any plans on adding one? 🙂

lucasbradstreet10:07:23

oops. Will do. Thanks!

lucasbradstreet10:07:40

Are you thinking of using it for something?

lucasbradstreet10:07:22

Done. Made it MIT so people can easily use it for whatever they want

acron10:07:16

@lucasbradstreet: Thanks - yes, I intend to use/adapt it 🙂

lucasbradstreet10:07:32

Cool, for something user facing/dashboard/etc?

lucasbradstreet10:07:44

Feel free to steal it completely, or send us improvements

acron10:07:17

Thanks; I'd cooked up some graphviz style ones but then I recalled this

lucasbradstreet10:07:59

the css is still a little wonky with the task display

acron10:07:02

On a different subject entirely, what would cause this?

16-07-11 10:32:22 acron-ubuntu INFO [onyx.messaging.aeron.publication-pool:79] - Creating publication at: {:channel "", :stream-id 1, :acker-id 18557, :peer-task-id 20607}
16-07-11 10:32:22 acron-ubuntu WARN [onyx.messaging.aeron.publication-manager:47] - Writing nil publication manager, likely due to timeout on creation.

lucasbradstreet10:07:20

I’ve only ever seen that on resource starvation e.g. long GCs

acron10:07:24

I'm seeing this for each peer, by the looks of it

acron10:07:32

This is local testing

lucasbradstreet10:07:10

vs how many cores on your machine?

acron10:07:43

Way more peers than cores

lucasbradstreet10:07:57

Yeah, it could be easy to starve the peers in that case

lucasbradstreet10:07:11

you can try creating a bigger batch timeout so they’re not trying to do things all the time

acron10:07:24

Ok, will try, thanks

acron11:07:13

@lucasbradstreet: Is it fair to say that I could be encountering a job size that's simply too large to run on a single machine?

lucasbradstreet11:07:18

There are probably some other things you can do to tune it, but it sounds like your job would need to be tuned/restructured to run on one machine

lucasbradstreet11:07:19

I'd want to use flight recorder / Java mission control to figure out what's consuming all the resources

lucasbradstreet11:07:51

Enabling the G1GC and setting the aeron liveness timeout higher can help

acron11:07:13

Sounds like a level of tuning outside of my comfort zone 😕

lucasbradstreet11:07:21

Also if you're running it at the repl it'll be using bad JVM flags to go fast

acron11:07:51

Ok, I'll try jarring it

acron11:07:04

Seems fine at 15 peers but beyond that, no luck

acron11:07:14

I need a lot more than 15..

lucasbradstreet11:07:11

Too many tasks in your job?

acron12:07:53

Yeah, seems so...

acron12:07:36

It doesnt even attempt to start though, which is odd

lucasbradstreet12:07:50

Are there any you can collapse? Common candidates are where A -> B can be collapsed into a single task

lucasbradstreet12:07:09

Do you have more than the minimum required peers started?

lucasbradstreet12:07:21

e.g. one per task + min-peers (if more than one) for each task

acron12:07:47

I have n-peers 1 on everything

acron12:07:00

This is literally just to run in a clojure test

lucasbradstreet12:07:32

So, say 20 peers, and 20 tasks and the job won’t start?

acron12:07:08

Would it be :onyx.messaging.aeron/publication-creation-timeout to fix that timeout?

lucasbradstreet12:07:46

That might help, yeah.

acron12:07:33

peer conf?

acron12:07:08

🙂

lucasbradstreet12:07:39

Also JVM OPTS: -XX:+UseG1GC -server -Daeron.client.liveness.timeout=50000000000

lucasbradstreet12:07:54

Not sure why it even has to connect, since it should be short circuiting everything

acron12:07:51

I notice in my config :onyx.messaging/allow-short-circuit? false

lucasbradstreet12:07:59

Yea try changing that 🙂

acron12:07:59

Does that pertain?

acron12:07:20

I inherited this config so not sure why these settings are how they are

lucasbradstreet12:07:30

That’s mostly to test whether your configuration is right when you go multinode, but it’ll kill perf on a single host

luiseugenio12:07:35

Is there any way to figure out the :flow-to in a event parameter inside a predicate fn called by a flow condition?

lucasbradstreet12:07:19

@luiseugenio: I don’t think so. It’d be nice if the flow condition map was passed into the predicate, since that would help parameterise the flow conditions better. I feel like this should be possible, but it doesn’t look like it

acron13:07:12

@lucasbradstreet: aeron warnings gone, now just no action after Enough peers are active, starting the task..

acron13:07:55

CPU @ 100%

lucasbradstreet13:07:12

try making :onyx/max-pending really small on your input tasks

lucasbradstreet13:07:27

just to make sure it’s that you’re not pushing too many segments through and they’re being constantly retried

lucasbradstreet13:07:32

now would be a good time to setup metrics

acron13:07:14

Ok, thanks

acron13:07:28

Is there much in the way of debug logging?

acron13:07:46

I added :onyx.log/config {:level :debug}

lucasbradstreet13:07:50

a bit. you’d at least be able to see how big the batches are

acron13:07:50

But no change

lucasbradstreet13:07:41

I would setup timbre onyx-metrics. I would also try running jmc (java mission control) and attach to the running process and profile it a bit

lucasbradstreet13:07:45

to see where the time is going

lucasbradstreet13:07:51

prob do onyx-metrics first

acron13:07:25

I did try jmc but I've not used it before and the best I could get was that it spent most if its time in the 'clojure.lang' package..

acron13:07:04

Hmm, it's doing a couple of tasks now and then stopping

acron13:07:25

This is strange

acron13:07:42

If I reduce the number of tasks it works

acron13:07:16

I'm investigating something else

luiseugenio14:07:32

@lucasbradstreet: Is there any chance that including the flow condition map inside de event map to become a wishlist feature ? 😊 I’m trying to use a rule engine to decide where to go in flow conditions, and if I could call only one predicate function (without having to pass some kind of hint of :flow/from and :flow/to) would be fantastic.

michaeldrogalis14:07:18

@luiseugenio: It's in the Event map now. Under keys :onyx.core/task-information -> :flow-conditions.

luiseugenio15:07:35

@michaeldrogalis: Ah, not all the flow conditions, but the map that contains the specific :flow-from and flow-to this event was called. Supose I have a workflow [:a :b] [:a :c] and the flow conditions maps are: {:flow/from :a :flow/to [:b] :flow/predicate ::can-i-go? } and the same for :a -> :c. So I could call a rule engine to decide (knowing the :flow-from and :flow-to for each case the predicate is called) where to go. This could be a stupid idea, anyway, but I could change the behaviour based only in the segment data and where is it trying to go.

michaeldrogalis15:07:18

Ah, right. Sorry, I cant think of an easy way to get at that data yet.

lucasbradstreet15:07:22

I think it’d make sense to pass the specific flow condition in. We do a similar thing with windows and triggers

lucasbradstreet15:07:33

It’d be another breaking change though 😕

michaeldrogalis15:07:31

Probably a change for 0.10.0.

luiseugenio16:07:54

@michaeldrogalis: and @lucasbradstreet , ok, no problem. I’ll do it another way (passing a parameter manually on each flow condition entry and later, if this could be gotten inside the event map, I’ll change the predicate function. Thanks.