Fork me on GitHub
#onyx
<
2017-10-18
>
brianh00:10:42

so thought i'd let you know that i seem to have the Minio working for checkpoint storage. the streaming job has been running for 2 hours now and everything memory related is looking good. gonna let it run overnight and see how things look in the AM.

Travis00:10:12

Awesome, glad to here that works

brianh00:10:46

me too! thanks for the suggestion!

jholmberg01:10:24

That's really cool @brianh. How did you deploy Minio? as a Kubernetes Service?

lucasbradstreet02:10:54

@brianh great to hear! Be mindful of how much disk space you give it, and what your barrier period is as it’s probably a ticking time bomb with respect to disk space, depending how big your windows are. Sounds like it did the trick though :)

lucasbradstreet02:10:13

@jholmberg @camechis I’ll have a response for you shortly. My main initial recommendation is to scrape/monitor the jmx metrics. You can do this with onyx-peer-http-query or a jmx metrics agent

lucasbradstreet02:10:59

I would have a look at checkpoint_store_latency_50thPercentile and checkpoint_store_latency_Max, checkpoint_written_bytes_Value, and checkpoint_size_Value

jholmberg02:10:37

Thanks @lucasbradstreet. We'll hook up onyx into our Kubernetes cluster and run some data through it to see how it does. We've got prometheus and grafana. Do you have any dashboards that work with that by chance? If not, we can just look at it in prometheus

lucasbradstreet02:10:27

@jholmberg I do but I’ll need to rip a couple things out. Shouldn’t be a problem though

jholmberg02:10:49

sweet, no trouble.

lucasbradstreet02:10:56

I’ll also push our Prometheus alerts somewhere

lucasbradstreet04:10:45

@jholmberg our dashboard is pretty project specific, includes a lot of tags that won’t apply to your use case

lucasbradstreet04:10:33

@jholmberg I’ve pushed up our prometheus alerts here https://github.com/onyx-platform/onyx-monitoring. They should be considered a starting point, but many of the parameters we’ve chosen relate to things like barrier period, the sort of fns that run, etc, and may need tuning.

jholmberg13:10:04

Awesome, this should be really helpful, thanks!

Travis13:10:26

Thanks @lucasbradstreet , this should give us a really good starting point

Travis15:10:55

Hey guys, any suggestions on how to submit/manage jobs in K8 ?

michaeldrogalis16:10:32

Haven't tried it yet, but this looks great - https://github.com/bamarco/onyx-sim

gardnervickers17:10:24

@camechis I think a job manager that watched some job configmaps for changes, killing and re-submitting the onyx job would be really neat.

Travis17:10:35

yeah, that would be nice.

lucasbradstreet17:10:12

Yeah, we leave it up to users at the moment, as needs can differ, but it’d be good to have something as a starting point, especially with respect to migration.

Travis18:10:52

One question i had in relation to submitting a job. After the job is submitted I see the process that submitted the job stays alive waiting to receive in exceptions. Is this necessary to leave that process open or can it just submit and be done ? Or is there a good use case for not just exiting ?

lucasbradstreet18:10:53

@camechis are you referring to feedback-exception!. If so, that’s mostly a test helper.

Travis18:10:14

Yeah that's the one

michaeldrogalis19:10:24

It can submit and immediately shut down after that call.

Travis19:10:56

That's what I thought