This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
- # admin-announcements (25)
- # beginners (132)
- # boot (89)
- # cider (26)
- # clara (12)
- # cljs-dev (10)
- # cljsrn (11)
- # clojure (151)
- # clojure-germany (8)
- # clojure-russia (1)
- # clojurescript (137)
- # cursive (33)
- # datavis (28)
- # datomic (3)
- # devcards (8)
- # hoplon (5)
- # immutant (11)
- # jobs (4)
- # ldnclj (58)
- # lein-figwheel (7)
- # off-topic (95)
- # om (114)
- # onyx (91)
- # parinfer (38)
- # portland-or (1)
- # re-frame (26)
- # reagent (1)
hey @lucasbradstreet i think i may have done something untoward in our timbre set up
@robert-stuttaford: ah, may have overwritten the existing configuration rather than merging?
hi @lucasbradstreet, if we see this message a lot: https://github.com/onyx-platform/onyx-metrics/blob/master/src/onyx/lifecycle/metrics/riemann.clj#L30, and also see lots of these:
15-Nov-24 06:23:27 production-HighstormStack-77PF3UB8TROI-i-bba8320b WARN [onyx.lifecycle.metrics.riemann] - ESC[37mjava.lang.Thread.runESC[m ESC[32m Thread.java: 745ESC[m ESC[37mjava.util.concurrent.ThreadPoolExecutor$Worker.runESC[m ESC[32mThreadPoolExecutor.java: 617ESC[m ESC[37mjava.util.concurrent.ThreadPoolExecutor.runWorkerESC[m ESC[32mThreadPoolExecutor.java: 1142ESC[m ESC[37mjava.util.concurrent.FutureTask.runESC[m ESC[32m FutureTask.java: 266ESC[m ESC[37m...ESC[m ESC[32m ESC[m ESC[33mclojure.core/binding-conveyor-fn/ESC[1;33mfnESC[m ESC[32m core.clj: 1916ESC[m ESC[33monyx.lifecycle.metrics.riemann/riemann-sender/ESC[1;33mfnESC[m ESC[32m riemann.clj: 21ESC[m ESC[33monyx.lifecycle.metrics.riemann/riemann-sender/fn/ESC[1;33mfnESC[m ESC[32m riemann.clj: 24ESC[m ESC[33mriemann.client/ESC[1;33msend-eventESC[m ESC[32m client.clj: 72ESC[m ESC[37mcom.aphyr.riemann.client.RiemannClient.sendEventESC[m ESC[32m RiemannClient.java: 115ESC[m ESC[37mcom.aphyr.riemann.client.RiemannClient.sendMessageESC[m ESC[32m RiemannClient.java: 110ESC[m ESC[37mcom.aphyr.riemann.client.TcpTransport.sendMessageESC[m ESC[32m TcpTransport.java: 259ESC[m ESC[37mcom.aphyr.riemann.client.TcpTransport.sendMessageESC[m ESC[32m TcpTransport.java: 289ESC[m ESC[1;31mjava.io.IOExceptionESC[m: ESC[3mno channels availableESC[m
(on the logging, it’s quite possible. i reverted to the previous setup for now)
we had a comedy of errors last night. ZK failed because it couldn’t write anything to disk. it was because we filled the disk up with a LOT of these riemann errors
of course, the other problem is that we didn’t actually assign all the ssd hard drive space to the OS, which caused us to hit this problem very quickly
is this an actual error, or something we can ignore? we shouldn’t get these warnings at all, ideally?
Ideally you shouldn't get them at all, though I'm sure you'll occasionally hit a retry which is why we resend.
ok. is it perhaps possible to squash some of that logging in the next release of -metrics, please?
It's a little tricky unless we log a periodic message with retry stats instead
It shouldn't be a load issue because benches show Riemann can handle 100K/sec and we're only pushing out 8ish per second
if we have a latency spike on a task with zero throughput, what could that mean?
Or the throughput is being rounded off? Would it show 10 instead of XK in that case?
ok. will stop pestering you until we have done our homework on our code. probably chat later on
I asked you a while back about the number of peers you use per task. The reason why is that we weren’t segmenting metrics by the peer id as well as the task name. This meant that you would have two peers outputting the same statistics, which means it’ll undercount throughput
For example, if you have two peers on the same task, :in, and both tasks output a throughput of 10, then the task will report 10 rather than 20
@robert-stuttaford: we’ll be releasing onyx-0.8.1 soon, which might be a good point for you to upgrade everything
how soon? we’re on a hangout talking about upgrading literally right now haha
ok. we’re going to be busy with code for a couple days anyway. i’m happy to do it when you have a release ready
the shaded area is the transactor cloudwatch metric for transaction count over time
if we see a significant delay, or significant increase in that delay, we know that our max-pending is too low
Yeah, I think that's a good idea. The input task is the most important to keep an eye on
@lucasbradstreet: on the riemann issue, would increasing :metrics/buffer-capacity from 10k help?
@lucasbradstreet: looking ahead to 0.8.1. the splitting of metrics; does this mean we’d have totally independent metrics per peer, allowing us to graph them separately?
Ah, I coallesced them into a single info message per second with a timeout count
yes, you’d have totally independent metrics per peer, and then you can aggregate them by task if you like
upgrading looks straightforward for us. just need to fix the ports for aeron, and update our datadog dash once we have metrics
I released the alpha (and a few other alphas to fix up issues in our automated plugin release process)