Fork me on GitHub
#onyx
<
2017-10-27
>
eriktjacobsen00:10:22

lucas / michael: Could either of you talk briefly about how updating code interacts with jobs, when to use resume points, etc. For instance, if I fix a bug in my business logic that doesn't change the onyx job definition, is best practice just killing all my peer java processes, redeploy a new jar, and just turn the peers back on, keeping the same job running throughout? Are the only times I should be using the resume-points is if I actually need to change the structure of my onyx job definition? Are there any timeouts or anything to be aware of with how long a job can stay in zk without peers, does it stay frozen indefinitely?

michaeldrogalis04:10:20

Will write up an answer in the AM - Ill get these all under the FAQ, they're good questions.

lmergen07:10:42

i'm having a bit trouble with getting all the onyx log messages

lmergen07:10:59

specifically, it appears that after a :submit-job, i'm not receiving anything else anymore

lmergen07:10:13

(at least, using subscribe-to-log)

lmergen07:10:25

i'm trying to figure it whether i'm at fault here

lmergen07:10:58

i know for a fact that a job is running, but my process that's monitoring the onyx log doesn't receive anything else anymore after submit-job

lmergen07:10:01

is that expected behavior ?

lucasbradstreet07:10:03

There won’t necessarily be any other messages unless something changes about the job

lucasbradstreet07:10:21

The submit job message is enough for all of the peers to know what to do and start the job

lmergen07:10:40

okay, so it's not possible by observing the log which jobs are actually running ?

lucasbradstreet07:10:00

You have to actually apply the log entries

lucasbradstreet07:10:10

Via the same log application method as the peers will use

lmergen07:10:03

aha, that sounds like the thing i was unaware of

lmergen07:10:33

what will applying the log do in this case ? trying to understand the flow here

lmergen07:10:44

will it then start communicating with the peers directly ?

lucasbradstreet07:10:53

just looking up a good example

lucasbradstreet07:10:54

old version since I got to the docs via google 😛

lmergen07:10:24

so, just for my understanding

lmergen07:10:04

applying the log will basically start communicating with the peers directly, to figure out the events that are not stored in the onyx log ?

lucasbradstreet07:10:05

basically, the current cluster state is a state machine which is built by applying the log entries in turn. This ensures that every node in your cluster has the same view of the cluster

lmergen07:10:06

is that correct ?

lucasbradstreet07:10:25

applying the log just applies the log command to the state machine

lucasbradstreet07:10:34

to materialize the view of the cluster

lmergen07:10:37

so if i apply the log makes progress in the state machine

lucasbradstreet07:10:59

if you don’t apply the log entries, you won’t know what the current cluster state is

lmergen07:10:01

that sentence doesn't make sense

lmergen07:10:32

i figured that this log application would occur behind the scenes

lucasbradstreet07:10:02

yeah, it does happen in the peer-group, but if you want to get a view of the cluster from outside this has to take place somewhere else.

lmergen07:10:13

yep, makes sense

lucasbradstreet07:10:18

since it’s easier for you to just do it yourself than reach into the peer group

lmergen07:10:34

thanks for your help!

lucasbradstreet07:10:31

no worries, good luck. Looking forward to seeing what you come up with. We have some similar code to do log -> job status in the DB. We tried to keep the flow one way otherwise it gets very tough to reason about.

lucasbradstreet07:10:45

essentially log playback -> DB for the job status

lmergen07:10:55

exactly what i'm working on right now

lucasbradstreet07:10:56

then if you want to change the job status, send a kill-job command to the cluster

lucasbradstreet07:10:10

and wait for that to go back through the playback flow to get the new status

lucasbradstreet07:10:34

Cool, hope it turns out well. Onyx needs something like this

lucasbradstreet07:10:52

hit me up if you have any more questions

lmergen07:10:02

sure thing

lmergen08:10:33

how does the replica figure out which jobs are allocated where ?

lmergen08:10:44

since the allocations is not explicitly part of the submit log entry, i assume that allocation is always deterministic and as such is correctly reproduced in all replicas. is this correct ?

lucasbradstreet08:10:33

Yes it’s completely deterministic.

lucasbradstreet08:10:08

The only exception /can/ be if you switch from one onyx version to another. When we migrate to new onyx versions we just figure out what the end submit/kill job results are and migrate the jobs that aren’t killed. That way we don’t need it to play back exactly the same

lmergen08:10:26

looks like it's that function that does all the allocation logic, right ?

lucasbradstreet08:10:18

Is this just for interest to understand how it’s all working?

lmergen08:10:33

well, more like i want to know what i'm doing 🙂

lucasbradstreet08:10:50

Ok cool :). Just making sure since I thought I might not understand your problem well enough

lmergen08:10:00

this replica state reproducing with so little information felt a bit like "magic"

lucasbradstreet08:10:07

Definitely good to understand what’s going on :)

lucasbradstreet08:10:12

Yeah it is a bit heh

lmergen08:10:41

well it makes sense, once you realise that it's all deterministic

lucasbradstreet08:10:42

It’s not so crazy once you understand it but it’s kinda novel for it to all work like that

lucasbradstreet08:10:26

We had to set a seed on the scheduler so it would make the same decisions everywhere

lucasbradstreet08:10:36

This is what we mean when we called onyx masterless, although we do still lean on zookeeper for the rest, so it’s not completely masterless

lucasbradstreet08:10:10

One nice thing with this is that you can find out the state of the cluster at any time, by looking at the replica state at that time

lucasbradstreet08:10:11

onyx-dashboard allows you to step through it, which can be useful when debugging

lmergen08:10:25

well, thanks for the info

lmergen08:10:29

good night

michaeldrogalis16:10:45

@eriktjacobsen Ill repaste your question down here:

lucas / michael:  Could either of you talk briefly about how updating code interacts with jobs, when to use resume points, etc. For instance, if I fix a bug in my business logic that doesn't change the onyx job definition, is best practice just killing all my peer java processes, redeploy a new jar, and just turn the peers back on, keeping the same job running throughout? Are the only times I should be using the resume-points is if I actually need to change the structure of my onyx job definition? Are there any timeouts or anything to be aware of with how long a job can stay in zk without peers, does it stay frozen indefinitely?

michaeldrogalis16:10:23

The first thing to get a mental picture of is the division of behavior (your concrete Clojure code and functions that do real things) and wiring instructions (the Onyx job that stitches all the Clojure code together at runtime). When you stand up a peer, all you've done is plugged that peer into the network to listen to the log for instructions, and possibly open up a network connection to another peer when it's going to run a task for a job. This peer has the behavior bundled into its jar - it doesnt know anything about your specific Onyx jobs. When you submit an Onyx job, you're putting those instructions about how to use your behavioral code into ZooKeeper, which goes into the log, which is eventually read by a peer. The submission of the job happens away from the cluster - these instructions are received at runtime. If you have a bug in your business logic that pertains to your code, you'll definitely need to reboot the peer with the new code. Whether you stop and start your job is a completely different matter that depends on whether or not you need to replay all your data to correct the error. It also depends on whether its okay for your domain to possible have two peers on at the same time with the incorrect and the correct code. Resume points are used for transferring state between jobs. So in the case that you do need to restart your job, but you want to keep your progress so far in regards to window contents, Kafka offsets, and so forth - you use resume points to literally resume previous progress.

michaeldrogalis16:10:22

There are no inherent limitations for how long a job can stay around before peers don't pick it up for work. Its simply a key in a map waiting for a peer to be scheduled on it. See the above conversation with @lmergen about how the log works.

michaeldrogalis16:10:53

I also gave a talk a few years ago on this called Inside Onyx that you can find on YouTube that explains the log. None of that has changed

lmergen16:10:23

it takes a bit of mental gymnastics to wrap your head around it

lmergen16:10:28

i think a large part of this design stems from the fact it’s masterless

michaeldrogalis16:10:20

It's a log architecture being used in a place where it's never been used before. Storm/Flink/Spark/Data flow all use architectures that rely on a centralized coordinator. To the best of my knowledge Onyx is the first to do this.

Travis20:10:35

So the latest on the Media Driver Saga on GKE. We did lots of debugging to determine if the peers could communicate over UDP and that seems to all work fine. Our last attempt for the day was to set the media driver into a dedicated thread mode . Currently the idle job has been running for 10 minutes where before it wouldn’t make it past 1 minute. Any thoughts on the media driver thread mode ?

eriktjacobsen20:10:07

Thanks for the explanation michael, clears things up

michaeldrogalis20:10:40

If it's surviving that's indicative of the peer being noisy wrt to resource usage. Dedicated thread mode for Aeron will do just that

lucasbradstreet21:10:42

@camechis if dedicated mode is making it better it makes me wonder if this job is complex/large and is opening a lot of channels. It would also partially explain why it’s working in single peer mode.

lucasbradstreet21:10:59

@camechis how many peers in this job?

Travis21:10:13

10 per peer

Travis21:10:24

only about 7 needed

lucasbradstreet21:10:35

Ok that isn’t the cause then

lucasbradstreet21:10:00

The media driver and the peers might just be fighting for resources then

Travis21:10:34

i have 5 physical nodes in my cluster with about 30 gigs available in mem and not a whole lot of CPU being used since everything is pretty idle

gardnervickers21:10:59

@lucasbradstreet, @camechis shared with me that they're individually limited through K8s resource constraints, more than our test env.

Travis21:10:04

4 cores each. I deployed 2 peers in this cluster

gardnervickers21:10:07

Yes, where we provide a single upper bound on both containers, they are providing individual bounds though.

lucasbradstreet21:10:32

That should be enough machine resources at least, though the limits might apply. I dunno.

gardnervickers21:10:06

Limits for CPU are not strict either, they're a scaling factor in relation to other containers running on the node.

lucasbradstreet21:10:08

I think you should get a flight recording of the peers and of Aeron

Travis21:10:24

any pointers on how that works in kubernetes

gardnervickers21:10:35

-Dcom.sun.management.jmxremote
           -Dcom.sun.management.jmxremote.authenticate=false
           -Dcom.sun.management.jmxremote.ssl=false
           -Dcom.sun.management.jmxremote.local.only=false
           -Dcom.sun.management.jmxremote.port=1099
           -Dcom.sun.management.jmxremote.rmi.port=1099
           -Djava.rmi.server.hostname=127.0.0.1
           -XX:+UnlockCommercialFeatures
           -XX:+FlightRecorder
Using those JVM opts, you can kubectl port-forward to 1099, then run Mission Control to get a flight recorder file.

lucasbradstreet21:10:46

I should say you should only do this for testing/non permanent prod purposes only, as flight recorder is commercial in jdk8. It’s opened up in OpenJDK/jdk9 thankfully

Travis21:10:49

oh nice, doesn’t sound to hard

lucasbradstreet21:10:01

It’s really great

eriktjacobsen21:10:14

I can't commit to anything for at least next 2 months, however, I'm curious are you guys actively working on spec integration? I see https://github.com/onyx-platform/onyx-spec but it looks like focused on speccing onyx itself rather than integrating it with jobs. It seems like an analogous data structure to workflow that lists specs for messages passing between two components, something in catalog to say what specs a task accepts and emits, and being able to put a spec directly into a flow condition instead of having to wrap it in an opaque predicate function would be really nice and would also let you generate diagrams / documentation for your job that actually shows the types of data flowing between each node. Thoughts?

michaeldrogalis21:10:30

@eriktjacobsen I had considered that before Spec came out. I think that would make a really nice supporting library - it adds an additional checker to your runtime