This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2017-10-18
Channels
- # aws (1)
- # aws-lambda (1)
- # beginners (48)
- # boot (15)
- # cider (3)
- # cljs-dev (4)
- # cljsrn (4)
- # clojure (241)
- # clojure-chicago (1)
- # clojure-dusseldorf (12)
- # clojure-greece (41)
- # clojure-italy (3)
- # clojure-russia (16)
- # clojure-spec (7)
- # clojure-uk (34)
- # clojurescript (88)
- # community-development (9)
- # cursive (8)
- # data-science (55)
- # datomic (40)
- # devops (1)
- # emacs (20)
- # fulcro (19)
- # graphql (3)
- # hoplon (46)
- # luminus (11)
- # lumo (4)
- # off-topic (27)
- # onyx (26)
- # other-languages (25)
- # pedestal (2)
- # powderkeg (6)
- # re-frame (11)
- # reagent (4)
- # ring-swagger (17)
- # rum (4)
- # shadow-cljs (103)
- # spacemacs (14)
- # specter (6)
- # unrepl (21)
- # yada (1)
so thought i'd let you know that i seem to have the Minio working for checkpoint storage. the streaming job has been running for 2 hours now and everything memory related is looking good. gonna let it run overnight and see how things look in the AM.
@brianh great to hear! Be mindful of how much disk space you give it, and what your barrier period is as it’s probably a ticking time bomb with respect to disk space, depending how big your windows are. Sounds like it did the trick though :)
@jholmberg @camechis I’ll have a response for you shortly. My main initial recommendation is to scrape/monitor the jmx metrics. You can do this with onyx-peer-http-query or a jmx metrics agent
I would have a look at checkpoint_store_latency_50thPercentile and checkpoint_store_latency_Max, checkpoint_written_bytes_Value, and checkpoint_size_Value
Thanks @lucasbradstreet. We'll hook up onyx into our Kubernetes cluster and run some data through it to see how it does. We've got prometheus and grafana. Do you have any dashboards that work with that by chance? If not, we can just look at it in prometheus
@jholmberg I do but I’ll need to rip a couple things out. Shouldn’t be a problem though
I’ll also push our Prometheus alerts somewhere
@jholmberg our dashboard is pretty project specific, includes a lot of tags that won’t apply to your use case
@jholmberg I’ve pushed up our prometheus alerts here https://github.com/onyx-platform/onyx-monitoring. They should be considered a starting point, but many of the parameters we’ve chosen relate to things like barrier period, the sort of fns that run, etc, and may need tuning.
Thanks @lucasbradstreet , this should give us a really good starting point
Haven't tried it yet, but this looks great - https://github.com/bamarco/onyx-sim
@camechis I think a job manager that watched some job configmaps for changes, killing and re-submitting the onyx job would be really neat.
Yeah, we leave it up to users at the moment, as needs can differ, but it’d be good to have something as a starting point, especially with respect to migration.
One question i had in relation to submitting a job. After the job is submitted I see the process that submitted the job stays alive waiting to receive in exceptions. Is this necessary to leave that process open or can it just submit and be done ? Or is there a good use case for not just exiting ?
@camechis are you referring to feedback-exception!. If so, that’s mostly a test helper.
It can submit and immediately shut down after that call.