onyx 2016-02-17 | Slack Archive

@michaeldrogalis: great, I'm looking forward to the talk

micahalles: awesome, I'd like to watch it

Thanks

@robert-stuttaford, or anyone: Do you have any ClojureScript talent on your team? I have an interesting, small-ish scoped project idea that would be beneficial Onyx users trying to visualize the throughput of their cluster.

lsnape17:02:01

@michaeldrogalis: I’ve written a fair few reagent webapps. Would be happy to contribute, but my domain knowledge of Onyx is limited

lsnape17:02:39

though interested to learn more, of course

robert-stuttaford17:02:02

we do what’s your idea?

michaeldrogalis17:02:31

@lsnape @robert-stuttaford It should be pretty straight-forward. I think the trickiest part is making the UI look appealing. So basically, Onyx has a strictly ordered log of all coordination messages in the cluster. By subscribing to the log (the API lets you do this through a core.async channel), you can learn things about the cluster at any particular point in time. For example, you can learn that at time T, Job :j1 has Peers X, Y, and Z allocated to its tasks. Being able to visualize that would be huuugely useful for people who are trying to tune their clusters for performance.

robert-stuttaford17:02:25

-drool-

michaeldrogalis17:02:35

There are two things that we can do that almost no other project can, by nature of it having a log. 1 - we can play the log forwards and backwards and show how peers move between tasks as you add or remove jobs. That'd be neat so you can intuit what the scheduler is doing through visualization. 2 - here's the real kicker.. We can call into a metrics serve and display the throughput for each peer at any time T.

michaeldrogalis17:02:34

You can think of having a forwards/backwards timeline somewhere in a web UI. As you go forwards in time, you get to see the concrete implications on performance as your cluster topology changes. I think this would be amazing to have.

robert-stuttaford17:02:58

i think you should draw some diagrams of how this data might be represented

robert-stuttaford17:02:21

if you can produce that, and cogent apis for a client to call, i’m sure the bit in the middle can be made quite easily!

michaeldrogalis17:02:25

I can do that The inspiration that we had was from BtrPlace: http://www.btrplace.org/play/ On "Use Case", select Vertical Elasticity, then "Solve", then go to the bottom and click the Play Button.

michaeldrogalis17:02:56

You can see animations of how the VMs move between servers. If we could do that, plus display metrics per machine.. Holy moly

robert-stuttaford17:02:01

i see it

robert-stuttaford17:02:06

holy moly indeed

robert-stuttaford17:02:35

in our case it’d be tasks between peers, right

robert-stuttaford17:02:42

and peers on processes

michaeldrogalis17:02:50

There's so much potential in Onyx because of how its designed. The bottleneck is creativity and developer time 😛

michaeldrogalis17:02:53

Yep

robert-stuttaford17:02:59

processes (~= servers) > peers > tasks

robert-stuttaford17:02:39

would such a UI be an extension of onyx-dashboard?

michaeldrogalis17:02:14

I think a standalone tool might be nice. But we could merge the two later

michaeldrogalis17:02:30

I also dont want to bog anyone down with more code than they need to handle

robert-stuttaford17:02:08

would the backend process have to be host to peers as well?

robert-stuttaford17:02:20

or can it read all it needs from the log independent of peers

michaeldrogalis17:02:42

The latter, yes. All it needs to do is talk to ZooKeeper. Very easy

michaeldrogalis17:02:32

Getting throughput from a metrics server would just be the beginning. You could imagine that you could talk to Yeller's API and inline exceptions, too. @tcrayford

michaeldrogalis17:02:19

Oh man, we could even overlay a heat map sort of deal and show where the bottleneck in the job is in terms of latency.

robert-stuttaford17:02:28

shut up shut up!

robert-stuttaford17:02:44

i wants it

michaeldrogalis17:02:01

I will spec it out, and we will build it ^_^ Still interested, @lsnape?

lsnape17:02:21

Absolutely. I’ve got quite a bit free time atm so happy to get involved :thumbsup:

michaeldrogalis17:02:22

Excellent. Ill put together some materials tonight. We can collaborate right in here.

robert-stuttaford17:02:05

free time -drool-

robert-stuttaford17:02:12

i wants it

lsnape17:02:02

Any preference on cljs setup? We’ve been used Reagent and Petrol internally and that works really well. Would like to have a play with Om Next too

michaeldrogalis17:02:55

I don't have a preference - whatever you all would like to use.

lsnape17:02:39

@robert-stuttaford: yeah part drool but MixRadio shut down on Monday so I still also need to find a job

robert-stuttaford17:02:54

terribly sorry to hear that!

lsnape17:02:35

it’s a real bummer. Incredible team, working environment. Investment deal collapsed at the last minute

robert-stuttaford17:02:05

i really enjoyed the AWS tooling talk one of your colleagues (or maybe you) gave recently

lsnape17:02:22

but on the plus side, lots of time to spend on open source 😄

michaeldrogalis17:02:37

@lsnape: Ugh, Im sorry to hear that too 😞

lsnape17:02:19

Neil probably. Deployment tooling is ace

lsnape17:02:38

Anyway, this sounds exciting. Let me know when you’ve got it spec’d out a bit more and i’ll give you a hand

michaeldrogalis17:02:59

Wonderful, will do. For minimum pain, I should be able to make a Docker container for ZooKeeper with a preloaded Onyx log. We could do the same for a metrics server.

gardnervickers17:02:31

This sounds amazing

robert-stuttaford18:02:19

is it ready yet? can i spin it up? :)))

michaeldrogalis18:02:32

We can probably have something working in a week.

michaeldrogalis18:02:20

Does DataDog let you retrieve graphs as SVG or images through its API?

michaeldrogalis18:02:50

Having day dreams about being able to display arbitrary performance graphs on a machine-by-machine basis with this tool. Someone snap me out of it. 😛

bridget18:02:17

I'm not going to stop you

michaeldrogalis18:02:48

Snaps fingers CLOJURE STACK TRACES! alter-var-root!! Dynamic class loaders!

robert-stuttaford18:02:32

onyx jobs not starting with no exceptions or log messages telling you why! 😁

michaeldrogalis18:02:55

Hahah. Add mo' peers 😉

michaeldrogalis18:02:10

But yes, I can see how that'd be frustrating as a new user. Hence, this tool.

robert-stuttaford18:02:11

so looking forward to more The Learn with Lucas tomorrow

michaeldrogalis18:02:28

He'll set you up. :thumbsup:

michaeldrogalis18:02:35

We have a patch out to attach arbitrary metadata to jobs that get logged with normal messages now, btw. Its pretty sweet - you can name your jobs and what-not.

robert-stuttaford19:02:28

oh, that’s handy

michaeldrogalis20:02:51

... Oh boy, backpressure visualization. That's another awesome thing that we can do, since backpressure mode on/off is recorded in the log. @lucasbradstreet

2016-02-17

Channels