This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2015-11-16
Channels
- # admin-announcements (9)
- # beginners (112)
- # boot (223)
- # cbus (10)
- # cider (19)
- # clara (2)
- # cljs-dev (81)
- # cljsjs (3)
- # cljsrn (45)
- # clojure (239)
- # clojure-conj (12)
- # clojure-poland (2)
- # clojure-russia (56)
- # clojure-taiwan (1)
- # clojurescript (57)
- # cursive (28)
- # datomic (5)
- # events (14)
- # immutant (1)
- # jobs (1)
- # ldnclj (8)
- # off-topic (28)
- # om (80)
- # onyx (121)
- # re-frame (10)
- # sneer-br (1)
- # spacemacs (40)
- # yada (44)
I accidently deleted the whole directory .After redeploying everything very fabric. No errors. all good.
Not the way, I wanted to solve the problem, but everything works now. I think maybe I have corrupted data in ZK.
@robert-stuttaford: what version of onyx-metrics are you using? I think you’re probably using a version that has a particularly bad memory leak in it. It might take you a while to hit it because the effect is moderately small, but you will definitely hit it.
lucasbradstreet: [org.onyxplatform/onyx-metrics "0.7.10" :exclusions [org.onyxplatform/onyx]]
I’m about to. Main problem is that you’re on 0.7 so I might need to back port it for now
I’ll push out a new 0.7 release for you. I wouldn’t recommend you upgrade to 0.8 quite yet, though things are looking pretty good there
We’re pretty sure 0.8 is good, but we’re just going to do a full performance test first
I was tracking completions on non input tasks, and the timestamps that I was putting in maps never got cleared (because they never got completed, seeing as they were not input tasks)
It actually exhibited itself as Aeron publications ending up closed. So I was looking in the wrong place for ages 😞
workload dependent, we’ve had to restart every couple of days or so. just haven’t had the headspace to profile it ourselves. given that we’re shipping a shit-ton of metrics to riemann, my guess is this is why
yes, i’m pretty sure we suffered it. our input task is given every datomic transaction we have.
15-Nov-16 12:37:46 wh01.c.tunlld-01.internal INFO [onyx.peer.task-lifecycle] - [2158b39b-c4c5-45ff-a79e-52f81fa83e46] Peer chose not to start the task yet. Backing off and retrying... 15-Nov-16 12:37:46 wh01.c.tunlld-01.internal INFO [onyx.peer.task-lifecycle] - [a817f591-2d84-4fba-a015-0837cedc1622] Peer chose not to start the task yet. Backing off and retrying... 15-Nov-16 12:37:46 wh01.c.tunlld-01.internal INFO [onyx.peer.task-lifecycle] - [c76b12c6-a192-4b34-8a09-d75b313bf9e0] Peer chose not to start the task yet. Backing off and retrying... 15-Nov-16 12:37:46 wh01.c.tunlld-01.internal INFO [onyx.peer.task-lifecycle] - [66f084f5-f530-437e-a7f1-aec697838fd8] Peer chose not to start the task yet. Backing off and retrying... 15-Nov-16 12:37:46 wh01.c.tunlld-01.internal INFO [onyx.peer.task-lifecycle] - [4f818531-1ea0-45e6-9c37-cecdbb272441] Peer chose not to start the task yet. Backing off and retrying...
It’ll look like: 15-Nov-16 18:14:09 lbpro INFO [onyx.lifecycle.metrics.timbre] - Metrics: {:job-id #uuid "5df1edbf-fe57-4905-96ae-0be6e24b0923", :task-id #uuid "97faf3bb-0718-463e-8a4b-e5428f5b6ffc", :task-name :inc1, :peer-id #uuid "496df455-f926-4d00-89de-31051c239d97", :service "[:inc1] 1s_retry-segment-rate", :window "1s", :metric :retry-rate, :value 0.0, :tags ["retry_segment_rate_1s" "onyx" ":inc1" "your-workflow-name"]}
Sorry, I can’t help you much more until you show that :metric :retry-rate, :value 0.0 is always 0.0
15-Nov-16 12:37:51 wh02.c.tunlld-01.internal INFO [hyper.onyx.desy] - received message: 15-Nov-16 12:37:51 wh02.c.tunlld-01.internal INFO [hyper.onyx.desy] - received message: 15-Nov-16 12:37:51 wh02.c.tunlld-01.internal INFO [hyper.onyx.desy] - received message: hi 15-Nov-16 12:37:51 wh02.c.tunlld-01.internal INFO [hyper.onyx.desy] - received message: 7 15-Nov-16 12:37:51 wh02.c.tunlld-01.internal INFO [hyper.onyx.desy] - received message: test 15-Nov-16 12:37:51 wh02.c.tunlld-01.internal INFO [hyper.onyx.desy] - received message: hi 15-Nov-16 12:37:51 wh02.c.tunlld-01.internal INFO [hyper.onyx.desy] - received message: new 15-Nov-16 12:37:51 wh02.c.tunlld-01.internal INFO [hyper.onyx.desy] - received message: 7 15-Nov-16 12:37:51 wh02.c.tunlld-01.internal INFO [hyper.onyx.desy] - received message: test 15-Nov-16 12:37:51 wh02.c.tunlld-01.internal INFO [hyper.onyx.desy] - received message: life 15-Nov-16 12:37:51 wh02.c.tunlld-01.internal INFO [hyper.onyx.desy] - received message: new 15-Nov-16 12:37:51 wh02.c.tunlld-01.internal INFO [hyper.onyx.desy] - received message: life 15-Nov-16 12:37:51 wh02.c.tunlld-01.internal INFO [hyper.onyx.functions.sample-functions] - seg: 15-Nov-16 12:37:51 wh02.c.tunlld-01.internal INFO [hyper.onyx.functions.sample-functions] - seg: hi 15-Nov-16 12:37:51 wh02.c.tunlld-01.internal INFO [hyper.onyx.functions.sample-functions] - seg: 7 15-Nov-16 12:37:51 wh02.c.tunlld-01.internal INFO [hyper.onyx.functions.sample-functions] - seg: test 15-Nov-16 12:37:51 wh02.c.tunlld-01.internal INFO [hyper.onyx.functions.sample-functions] - seg: new 15-Nov-16 12:37:51 wh02.c.tunlld-01.internal INFO [hyper.onyx.functions.sample-functions] - seg: life 15-Nov-16 12:37:51 wh02.c.tunlld-01.internal INFO [hyper.onyx.functions.sample-functions] - seg: 15-Nov-16 12:37:51 wh02.c.tunlld-01.internal INFO [hyper.onyx.functions.sample-functions] - seg: hi 15-Nov-16 12:37:51 wh02.c.tunlld-01.internal INFO [hyper.onyx.functions.sample-functions] - seg: 7 15-Nov-16 12:37:51 wh02.c.tunlld-01.internal INFO [hyper.onyx.functions.sample-functions] - seg: test 15-Nov-16 12:37:51 wh02.c.tunlld-01.internal INFO [hyper.onyx.functions.sample-functions] - seg: new 15-Nov-16 12:37:51 wh02.c.tunlld-01.internal INFO [hyper.onyx.functions.sample-functions] - seg: life
there is no ':metric :retry-rate, :value 0.0 is always 0.0’ related log entries. is this a kafka plugin specific setup error?
Just setup onyx-metrics with timbre logging so that you can look at your logs and see whether any retries happen
You can use the dashboard if you want but for now just using the timbre logging and inspecting your logs will be enough
Could not find artifact org.onyxplatform:onyx-metrics:jar:0.8.0.1 in central (https://repo1.maven.org/maven2/) Could not find artifact org.onyxplatform:onyx-metrics:jar:0.8.0.1 in clojars (https://clojars.org/repo/)
Metrics: {:job-id #uuid "1f423693-ccd1-4f4e-85c8-baa277a7ff84", :task-id #uuid "a802c18e-7a94-4fba-81c6-f110e4551501", :task-name :my-dent, :peer-id #uuid "c4bc13d3-bbe1-43cb-9d42-165dd9bddf83", :service "[:my-dent] 1s_retry-segment-rate", :window "1s", :metric :retry-rate, :value 0.0, :tags ["retry_segment_rate_1s" "onyx" ":my-dent" "your-workflow-name"]}
So my theory was that your messages weren't being acked and we're being retried
Yeah I believe it. You really need to look at the point where there's some throughput though
Yeah. So initially throughput will be positive. At the some point later for the second call, throughput will be positive again. I want to know whether retries were positive just prior too
Ok, so a few things make sense to me. Either you're reading messages and they're not getting acked. If this is happening then you'll get replays after onyx/pending-timeout which is 60s
Second alternative is you're submitting jobs multiple times. Potentially using the same group-Id (which is how it commits the checkpoint)
Other thing I can think of is that you have multiple results on the input medium (i.e. You're accidentally writing your input messages to your Kafka topic twice)