Fork me on GitHub
#onyx
<
2016-03-01
>
lsnape20:03:56

I’m doing a knowledge share session tomorrow with a bunch of guys on the engineering team. I think I can comfortably give an intro to Onyx, the concepts and a couple of demos. I’ve got a couple of questions that it would be good to get answers to:

michaeldrogalis20:03:00

@lsnape: Sure, fire away. Will have to answer asynchronously.

lsnape21:03:13

For one of my projects I used a fixed window and trigger to write entries to dynamo. I found that this was the true output of my job so to speak, and that the core.async plug-in output was just really for logging purposes. I guess the question is, in a production job how does the window + triggers fit in with the input -> functions -> output workflow?

michaeldrogalis21:03:04

You mean like having an aggregate value going into another workflow altogether, right? Or even a downstream task?

lsnape21:03:31

yes precisely

lsnape21:03:20

would that be another job, or is there a way to feed the aggregate values back into the workflow of the same job?

michaeldrogalis21:03:24

We have an open ticket for chaining aggregate values together: https://github.com/onyx-platform/onyx/issues/323 It will be worked on in the next few weeks, it's high priority. For now, we recommend using triggers to write to external storage - say, Kafka, and having another job read from storage. You can use 1 job with 2 disjoint DAGs in the workflow if you'd like, too

lsnape21:03:01

Ah that’s cool simple_smile

lsnape21:03:39

What about the situation where the workflow output task isn’t really required: my job only needs to compute aggregates. Is this something that’s cropped up before?

lsnape21:03:11

(the job I made is a toy example really and I don’t know if this is a problem in practice)

michaeldrogalis21:03:03

Its in the docs somewhere, but that's a nice concise example

lsnape21:03:51

gotcha, thanks

lsnape21:03:37

only other question i had was about deployment. I haven’t used mesos or marathon so please forgive my ignorance. If I want to deploy a new version of Onyx to the cluster, how do I ensure that the current jobs have all finished before killing off the instances?

lsnape21:03:51

and i guess some understanding of how new nodes are introduced into the cluster without disturbing existing ones, if that makes sense

gardnervickers22:03:36

@lsnape: It is possible to deploy your upgraded jar’s with a new tenancy-id

gardnervickers22:03:12

That way there’s no interaction between the old and new version, you can make sure everything is up and running before killing your old jobs on the previous tenancy-id and switching them over to the new tenancy-id

gardnervickers22:03:56

We have been using git sha’s for tenancy-id’s lately and it’s worked out quite well

gardnervickers22:03:59

It is possible to use the Onyx dashboard to kill jobs at the moment, but right now we’re working on better ways to manage what jobs are running across the entire onyx deployment (multiple tenancy-id’s).

lsnape22:03:41

@gardnervickers: that makes sense, thanks

gardnervickers22:03:08

@lsnape: Are you planning on using mesos/marathon for deployment?

lsnape22:03:22

I guess terminating streaming jobs would require some coordination

gardnervickers22:03:49

@lsnape: we mostly handle that by checkpointing what’s already been read. It works today, the API is just cumbersome.

lsnape22:03:12

No, I’m just preparing a session on Onyx to some colleagues of mine. Just wanted to know a bit more about the deployment process but what you’ve said is sufficient

gardnervickers22:03:09

I’m working on this today actually, getting our benchmarks running on Kubernetes so feel free to ask any questions you might have. We will end up hosting the benchmark containers on the Docker Registry too.

lsnape22:03:04

cool. Good luck! It’s quite late here in the UK so off to catch some zzz 😴

gardnervickers22:03:13

Catch ya later!