onyx 2016-06-09 | Slack Archive

jeroenvandijk07:06:21

@drewverlee: Maybe compatible was too strong, but Hadoop (and others?) use S3 as if it is hdfs (see https://wiki.apache.org/hadoop/AmazonS3)

jeroenvandijk07:06:27

if you choose the right scheme, you can read the files in the same way as normal files on S3 (always forget which one)

jeroenvandijk07:06:34

@lucasbradstreet: Now I have thought about it longer. S3 cannot fully replace hdfs for jobs with large files as you probably still need a local hdfs for temporary storage (S3 adds some latency). Or you can use the normal local files system, but that doesn’t give the same fault tolerance I guess

lucasbradstreet07:06:15

That makes sense. Using the local filesystem will be troublesome for multi-node use too

aspra09:06:11

What is advisable for the :kafka/fetch-size option of the onyx-kafka plugin? If our average messages are of size X is it advisable that we set it as close to that size as possible or a really large value will also do?

lucasbradstreet09:06:53

I haven't done any experiments to test that, but my initial guess would be to make it big enough to fetch an entire batch of messages in one request i.e. onyx/batch-size * average size of message + some headroom

lucasbradstreet09:06:30

That's a good starting point but you may want to tune it at some point if Kafka seems to be the bottleneck

aspra09:06:52

cool, thx!

asolovyov10:06:40

it it possible for a task to return a vector of segments so that next task will be called with each segment separately?

asolovyov10:06:49

or something like that 🙂

lucasbradstreet10:06:35

If you return a vector of segments then it’s an implicit flatmap, if that’s what you’re asking?

asolovyov10:06:09

hah

asolovyov10:06:10

I guess

asolovyov10:06:24

or... what's flatmap? :))

lucasbradstreet10:06:21

mapcat 🙂

lucasbradstreet10:06:52

But anyway, to answer your question, it’ll auto-unroll a vector of segments so that the next task will apply onyx/fn to the individual segments

lucasbradstreet10:06:10

i.e. it should do what you ask. I don’t love the discoverability of that feature because it’s a bit magic

asolovyov10:06:40

well.. that's exactly what I expected :-))

lucasbradstreet10:06:14

:thumbsup:

manderson14:06:56

Will the webinar be recorded and available after the event? I have some co-workers wanting to watch it, but they can't make the noon time.

lucasbradstreet14:06:28

Indeed it will, based on my understanding of YouTube live 🙂

manderson14:06:06

cool! thanks 👍

michaeldrogalis15:06:21

Starting in ~55 minutes. Link: https://www.youtube.com/watch?v=5eEKZa2DSJI

nha15:06:58

:drool:

dajjedzenie16:06:19

what happened 😞

michaeldrogalis16:06:29

Hey folks. Just need a minute. Hangouts on Air is.. Unintuitive 🙂

acron16:06:38

michaeldrogalis16:06:58

Apparently you can only hit the broadcast button once. 😕 Spinning up a new link now.

michaeldrogalis16:06:25

http://youtu.be/5hFV_W9-10s

michaeldrogalis16:06:43

That was amazingly hard.

manderson16:06:29

So, start time 12:30 now? I'm assuming still 1 hour? Just trying to plan 🙂

michaeldrogalis16:06:53

Starting asap

manderson16:06:59

👍

acron16:06:50

And we're off!

michaeldrogalis16:06:51

Everything look okay now?

acron16:06:54

Yep

manderson16:06:57

great!

nha16:06:25

Yes 😄

michaeldrogalis16:06:45

Whew, cool.

michaeldrogalis16:06:57

Is the font size okay now?

acron16:06:47

:thumbsup:

aengelberg16:06:57

Are these "incrementally build up tasks" tools only in the new onyx template, or new features to onyx in general?

otfrom16:06:35

I was having trouble before as the quality was on auto and this made things unreadable. Selecting 720p has fixed that

lucasbradstreet16:06:59

@aengelberg: it’s more of a design pattern. We have a couple of helpful functions like “add-task” in onyx core

lucasbradstreet16:06:09

@otfrom: I had that problem too

aengelberg16:06:33

gotcha. just curious how this relates to the onyx template if at all.

lucasbradstreet16:06:19

onyx-template’s example job is built around this pattern, so it kinda made sense to start there

aengelberg16:06:46

cool

richiardiandrea16:06:12

oh man, I am late 😄

michaeldrogalis16:06:37

@richiardiandrea: Only missed a few minutes. We floundered for about 10 minutes trying to get the stream started.

richiardiandrea16:06:02

great I am in

michaeldrogalis16:06:20

https://github.com/onyx-platform/onyx-twitter-sample

andrewhr16:06:51

exposing personal keys are a kind of rite of initiation for Onyx presentations, isn't @michaeldrogalis? 😉

lucasbradstreet16:06:10

Ahaha. Gardner made it read only first 😄

michaeldrogalis16:06:26

@andrewhr: Hahaha.

michaeldrogalis16:06:38

Man I forgot about that.

michaeldrogalis16:06:54

Any questions so far that I can forward to Gardner?

michaeldrogalis16:06:02

We'll take questions again at the end, too.

acron16:06:05

Q: You spoke about tying this into Kubernetes. Is that still something on your radar?

manderson16:06:33

Q: add on to ^^^ mesos/marathon on radar?

gardnervickers16:06:54

@acron: Yes!

acron16:06:11

@manderson: can help with mesos/marathon if you ever need, that's how we're deploying Onyx currently

gardnervickers16:06:11

The way the job submission is structured is to take advantage of kubernete’s “Job” api

gardnervickers16:06:38

They allow you to run on-off containers that are guaranteed to run on the cluster at some point.

gardnervickers16:06:22

And after job submission, we block on job completion so you can use kubectl get jobs to see what you have currently running

michaeldrogalis16:06:24

We'll probably do a few more of these now that we got the hang of running a live stream. We're really looking to build up community knowledge sharing here since our time is being continually limited by commercial support. We definitely don't want to be the bottleneck of getting around problems or giving design advice.

manderson16:06:29

@acron: cool! would love to chat about that at some point as that's what we're using as well

acron16:06:56

@gardnervickers: that sounds great

gardnervickers16:06:23

Mesos is not as familiar to me. I’m hoping as progress on localkube progresses folks start adopting that over docker-compose.

richiardiandrea16:06:36

it's great to have these kind of things for people to understand the moving parts

lucasbradstreet16:06:41

We’ve kept the peer / job / deployment structure pretty agnostic to the tools, so there’s no reason why mesos wouldn’t work, though it might not be a priority for us. We’d love to have tutorials for both though.

gardnervickers16:06:22

The biggest hurdle is one I have not seen solved from either platform, is how to get zookeeper ensembles running in a fault tolerant manner

michaeldrogalis16:06:42

To reiterate, feel free to send a pull request adding arbitrary features to https://github.com/onyx-platform/onyx-twitter-sample. A big-ish project that's community built would be great.

gardnervickers16:06:57

I believe Kubernetes has a solution for this in their next release, PetSets, but I’m not sure when that’s scheduled

gardnervickers16:06:33

One thing I’d love to see is creating a task bundle for writing Spectre queries

gardnervickers16:06:18

But there’s a lot of potential around debugging too. It would be great to have a “task-bundle-modifier” that would start a web server and show you what’s happening to segments running through your task, or have nice visualizations for whats inside your windows.

manderson16:06:58

i've got to run to another meeting. thanks @gardnervickers for the presentation! Good stuff!

andrewhr17:06:43

Q: the migration-at-task-definition is safe because you assume the job responsible to migrate is the only one writing to that table, right?

gardnervickers17:06:55

@andrewhr: Multiple migrations are idempotent with Joplin

gardnervickers17:06:44

Although I could be wrong on what goes on with creating tables in SQL, I would assume the table is locked before it’s fully initialized.

gardnervickers17:06:22

https://github.com/onyx-platform/lib-onyx/blob/master/src/lib_onyx/joplin.clj#L64

gardnervickers17:06:38

sorry wrong link

gardnervickers17:06:39

https://github.com/onyx-platform/lib-onyx/blob/master/src/lib_onyx/joplin.clj#L30-L31

gardnervickers17:06:08

If joplin crashes or reports that the DB is not migrated, the task will not start.

gardnervickers17:06:19

it will actually retry

andrewhr17:06:33

cool!

andrewhr17:06:59

Q: very noob question: how do cluster a upgrade works, let's say a new migration or task change? Shuts everything down?

gardnervickers17:06:09

@andrewhr: You don’t have to, just redeploy a new container image with a different ONYX_ID set

gardnervickers17:06:31

It’ll be totally seperate from your running cluster, then you can start transitioning jobs

gardnervickers17:06:30

Depending on whats changed, you could also do a rolling restart under the same ONYX_ID, causing the running tasks to update.

andrewhr17:06:13

@gardnervickers: right... given the pattern you've demonstrated, I imagine that I will end up with two "sets" of running containers (despite sharing the image): the cluster peers and the job submissions themselves. Following your strategy, I could spin a new set of peers with a different ONYX_ID, and then start to move those submissions to the new cluster until everything is updated. Something like that?

andrewhr17:06:09

Sorry if I'm getting too picky, just trying to get a mental image of the whole machinery 😅

michaeldrogalis17:06:07

Thanks for following along, ya'll. We'll have more to show soon!

andrewhr17:06:20

yeah, congratulations for the whole team. Looking forward for more! 👏

gardnervickers18:06:45

@andrewhr: Yea exactly!

2016-06-09

Channels