Fork me on GitHub
#onyx
<
2016-02-18
>
michaeldrogalis03:02:38

@lsnape @robert-stuttaford & anyone else that wants to help out with this project: voila! https://github.com/MichaelDrogalis/onyx-log-subscriber-demo#onyx-log-subscriber-demo This is a self contained project that shows how you can subscribe to the Onyx log for incremental changes of the cluster state over time. It includes a Docker container preloaded with some activity on an Onyx cluster I spun up earlier - meaning you won't need to run Onyx at all to try this. Once the container is up, just change the IP and port to match your Docker container, and away it goes.

michaeldrogalis03:02:45

You can poke around at the replica more to get an idea of what sort of information you have access to. Other interest things in there are the IP addresses and ports of every peer in the cluster. Most relevant for what we're doing right now is :allocations though. That key maps :job-id -> :task-id -> [:peer-ids]

michaeldrogalis03:02:26

If we can get a visualization that lets you step through the log over time, showing the movement of peers to tasks over time, I think that's already a big win.

michaeldrogalis03:02:14

I need to make a few changes to Onyx itself to support undo - that is, moving backwards through the log over time. It's actually easy to implement, I just some spare time to do it. Still, being able to incrementally move forward through time and see what's going on is awesome. 😄

michaeldrogalis03:02:33

Signing off for tonight, Ill be around tomorrow in the AM. Exciting!!

robert-stuttaford05:02:25

@michaeldrogalis: i think you should capture what you’re hoping to see in the project readme, so that we can tweet the project link simple_smile

nha08:02:24

Just reading that now, that seems awesome !

nha08:02:56

@lsnape: I'm thinking about quitting to find time to do some Clojure / Open-Source / personnal projects / maybe freelance

lsnape12:02:39

@michaeldrogalis: Nice! I’m going to be busy today but hopefully tomorrow I can spend a few hours on this. I will need to take some time to reinforce my understanding of the information contained in the logs.

lucasbradstreet12:02:18

@lsnape: please feel free to hit me up if you have any questions and @michaeldrogalis isn’t around

lsnape12:02:06

@lucasbradstreet: will do :thumbsup:

lsnape12:02:31

@nha: the idea of working on open source full-time is very attractive if you’re able to support yourself financially, or are confident that you can easily get work if needed. I guess that entirely depends on your circumstances simple_smile

nha12:02:14

@lsnape: agreed. I think I can support myself financially (through freelance) but that remain to be seen

nha12:02:43

worst case is, I have work as reference for a future employer

lsnape12:02:18

indeed, and that shouldn’t be underestimated

nha13:02:14

also, the fun part is using/learning Clojure in the process 😛

lucasbradstreet13:02:52

I've been working on Onyx full time for almost a year with very little income to try to bootstrap it

lucasbradstreet13:02:21

The picture is looking rosier now thankfully

lucasbradstreet13:02:55

The technical experience that I'll be able to show to an employer, as a backup plan, was definitely part of my thinking

robert-stuttaford13:02:01

will submitting a job via onyx.api/submit-job succeed even if there are no active peers/peer-groups?

robert-stuttaford13:02:26

that is, does it matter if i warm peers up first and then submit, or can i do it in any order?

robert-stuttaford13:02:03

1. take down peers. 2. warm up peers. a. kill jobs. b. start jobs. could 1,2 happen after a,b?

lucasbradstreet13:02:54

You can do it in any order

lucasbradstreet13:02:49

Though you probably don't want to have both onyx-ids running at the same time

robert-stuttaford13:02:52

if i do 1,2,a,b, does a need to wait for 1 to finish?

robert-stuttaford13:02:30

that is, does the job-stop/start entry point need to wait for the peer-stop/start process to finish - would killing jobs while killing peers muddy the waters

lucasbradstreet13:02:12

Right, so it's preferable you kill the job first for clean shutdown, then take down the peers, then bring up the peers on the new onyx-id

lucasbradstreet13:02:37

The submit job to a new onyx-Id can happen at any time

robert-stuttaford13:02:11

cool. i got it, thanks

lucasbradstreet13:02:18

I'd probably do the submit job first

lucasbradstreet13:02:29

Just to get it out of the way

lucasbradstreet13:02:07

But if you do that you need to make sure the peers are really down, or the job is killed before the new peers come up

robert-stuttaford13:02:27

cool. going to try to make this work. thanks!

lucasbradstreet13:02:00

You could even do ba12

lucasbradstreet13:02:35

That would be my preference

robert-stuttaford13:02:36

interesting. wouldn’t that attempt to use any idle peers?

robert-stuttaford13:02:53

given that we’re likely to have surplus

lucasbradstreet13:02:08

Sorry, we're talking about running on a new onyx-id, right?

lucasbradstreet13:02:34

The submit job will be scoped to the new onyx-id, so there won't be any peers to run on the job until 2

robert-stuttaford13:02:44

right, so they’d be compartmentalised, got it

lucasbradstreet13:02:04

Yeah, we're about to rename onyx/id as onyx/tenancy-id

lucasbradstreet13:02:42

One pattern that could be used is submit, stop peers, start peers, check metrics. This would allow you to roll back to the previous jar / onyx-id and have the jobs keep running in case of issues.

lucasbradstreet13:02:53

I'd have to think about it and try it out to make sure it's a good idea

lucasbradstreet13:02:09

It might be better to just use the same procedure to rollback

robert-stuttaford13:02:13

interesting. we’re not there, yet. but that sounds like a good place to get to

robert-stuttaford13:02:30

we’re using aws codedeploy which has rollback capability

robert-stuttaford13:02:49

if the validate step fails, it rolls back. validate is a .sh on-server, so it could do this

lucasbradstreet13:02:55

The kill job isn't strictly necessary since there won't be any peers running, but it's still mostly a good idea

robert-stuttaford13:02:05

yeah. cleaning up is good

michaeldrogalis15:02:11

@robert-stuttaford: Yep, I'll write something up in the README later today. I'd prefer not to tweet about it though. I made that mistake with the CheatSheet project. Lots of people said they wanted to help, but no one ended up taking responsibility. We can try to involve a wider audience once there is progress and someone is clearly in charge. simple_smile

robert-stuttaford15:02:30

that’s a wise plan

lucasbradstreet15:02:36

Agreed, since we definitely need help to get it done, since we're pretty busy with core work

robert-stuttaford15:02:00

dude, i’ve got too many of those already

robert-stuttaford15:02:13

running a 13 person team is hard work!

robert-stuttaford15:02:26

no matter how awesome the language and tools

michaeldrogalis15:02:39

People are hard simple_smile