Fork me on GitHub
#onyx
<
2015-10-06
>
michaeldrogalis15:10:35

@chrisn: Hey, just out of curiosity - any progress on the big deployment?

chrisn16:10:58

Yes, we never went with more than 11 machines (which was still 11*40 cores). We ran into a bunch of docker issues with big swarms; machines wouldn't join, docker pull was inconsistent (possibly a docker repo v1 issue) etc.

chrisn16:10:55

My ambitions are to make this a lot more turnkey so the command is more like 'run this dataset on 40 machines and put the results into files of N records per file here'.

chrisn16:10:07

Of course this is not an easy path.

chrisn16:10:21

Especially with commands being inconsistent.

michaeldrogalis16:10:48

@chrisn: 11 is still really good. I haven't had the easiest time managing big clusters either.

chrisn16:10:56

One other issues we have been having is keeping track of queue position and just general progress (something that I know the dashboard would help with) but I would like to have repl access to ask questions like 'how many records has the input task processed'. I know this information is in the log and such and if we used Kafka it would do some of this for us but the issue with Kafka is that you have to have something to submit the work to it which isn't ideal when you know your entire workload (in a seq of some sort) ahead of time.

chrisn16:10:10

The seq plugin really is idea for this type of work.

michaeldrogalis16:10:40

@chrisn: Should be able to use a simple lifecycle hook to count processed records.

chrisn16:10:42

Then I could get estimated time of completion and some idea of the speedup each machine gives us.

chrisn16:10:50

Good point, yes I can see that.

michaeldrogalis16:10:06

I read in an interesting paper earlier that Google essentially automated progress metrics with DataFlow internally.

michaeldrogalis16:10:23

They have callbacks that estimate how long things have been running, and how much is left to go. Kinda cool

michaeldrogalis16:10:43

So is this out there running in prod, or internally?

chrisn16:10:02

The results could be considered in prod but the entire project isn't in production yet.

chrisn16:10:28

This is image processing so we are using the results on a website but it isn't a live cluster. This is a production setting, however, not a research project.

chrisn16:10:11

We would be honored simple_smile.

noisesmith21:10:42

Little Bird is very close to having GA release using Onyx as well

noisesmith21:10:49

not live yet though, it's in beta

michaeldrogalis22:10:02

@noisesmith: So cool! Thanks for the updates. We really like hearing about where this is running.

michaeldrogalis22:10:10

Also, refresh my memory: GA stands for?

noisesmith22:10:54

sorry, startup speak

noisesmith22:10:57

general availability

noisesmith22:10:09

as in, any customer can sign up and use it, rather than select preview

michaeldrogalis22:10:30

Oh excellent. Let me know if/when you want your logo on the README.

michaeldrogalis22:10:07

Must run - back later!