Fork me on GitHub
#onyx
<
2015-11-28
>
robert-stuttaford06:11:13

@lucasbradstreet: hi simple_smile should the riemann logstorm be gone from 0.8.2?

robert-stuttaford06:11:39

locally, i’ve got it configured to connect but have no riemann running, and i’m getting lots of those errors still

lucasbradstreet06:11:15

Hmm. It should've been but maybe I did it wrong. I'm only doing it for the timed out messages - maybe the unable to connect ones are still filling up the logs

robert-stuttaford06:11:25

ah, ok, that’s likely

lucasbradstreet06:11:20

I'll have a look at it when I get home in 30. I may push up a new version

robert-stuttaford06:11:23

i think it might be worth doing the same thing - it wrote 42mb in under a minute

robert-stuttaford06:11:47

also, the CPU was poked

robert-stuttaford06:11:56

busy testing our stuff for 0.8.2 now

robert-stuttaford06:11:20

testing without trying to connect to riemann now

lucasbradstreet06:11:18

That one is going to be trickier because I definitely want to log exceptions. Guess I'll have to rate limit it

robert-stuttaford06:11:50

yup. without riemann config switched on, no CPU craziness

robert-stuttaford06:11:12

in other news, 0.8.2 is all good!

lucasbradstreet07:11:51

Hmm, when my test can’t connect to riemann it just hangs at creating the connection. Does yours take a while and then time out?

lucasbradstreet07:11:18

Nevermind, config was bad and we don’t validate well 😛

lucasbradstreet08:11:14

@robert-stuttaford: I’ve release metrics 0.8.2.2 which exponentially backs off connection failures until a maximum of 1s

lucasbradstreet08:11:23

so you’ll get a log entry every second which I think is semi acceptable

robert-stuttaford14:11:57

ok, great, thank you Lucas!

lucasbradstreet15:11:25

Ooh, that’s pretty cool looking. It has to be adaptive though

lucasbradstreet15:11:15

Well, we’re only throttling the log messages / backing off on writes on failures

michaeldrogalis15:11:26

Ohh - I see what you mean.

lucasbradstreet17:11:31

@robert-stuttaford: complete latencies aren’t showing up for me in riemann/grafana. I think it’s due to the period in the names: e.g. 99.9th_percentile_complete_latency

lucasbradstreet17:11:50

@robert-stuttaford: do they work with you with datadog? I might change the service to use 99_9th or something similar

robert-stuttaford19:11:52

yep we see them in datadog

lucasbradstreet19:11:55

Hrm. I wonder if it’s just a grafana thing, or whether making the service [task-name]_peer-id will affect things for you

lucasbradstreet19:11:15

If it’s just Grafana then I may just leave it the way it is