onyx 2017-01-20 | Slack Archive

asolovyov05:01:13

I think onyx-dashboard needs version indication somewhere, given that upgrading onyx requires upgrading onyx-dashboard 🙂

lucasbradstreet05:01:44

@asolovyov yeah, it definitely does need that

asolovyov05:01:12

this problem is really weird:

:read-mk-campaign logging segment: {:offset 3030
:read-mk-campaign logging segment: {:offset 3031
:read-mk-campaign logging segment: {:offset 3032
:read-mk-campaign logging segment: {:offset 3033
:read-mk-campaign logging segment: {:offset 3034
:read-mk-campaign logging segment: {:offset 3035
:read-mk-campaign logging segment: {:offset 3028
:read-mk-campaign logging segment: {:offset 3037
:read-mk-campaign logging segment: {:offset 3033
:read-mk-campaign logging segment: {:offset 3032
:read-mk-campaign logging segment: {:offset 3031
:read-mk-campaign logging segment: {:offset 3030
:read-mk-campaign logging segment: {:offset 3029
:read-mk-campaign logging segment: {:offset 3036
:read-mk-campaign logging segment: {:offset 3035
:read-mk-campaign logging segment: {:offset 3034

asolovyov05:01:34

and it seems it goes through whole task, actually writes feedback to kafka, and then starts again

asolovyov05:01:47

I'll go through kafka though to confirm...

asolovyov05:01:42

that's exactly what it does 😞

asolovyov08:01:00

@lucasbradstreet should it write something in logs on retry? because it seems there are some retries, it's just a bit lower than other charts and harder to notice 🙂

asolovyov08:01:11

because I don't see word "retry" in logs

lucasbradstreet08:01:24

Ah, no, it will not write anything to the timbre logs

asolovyov08:01:32

heh

asolovyov08:01:40

any pointers on how to determine why it retries?

asolovyov08:01:49

can I add some logging?

lucasbradstreet08:01:48

It’s usually one of two things. Either the segment is causing a task to crash (and thus reboot), or those segments are taking longer than onyx/pending-timeout to process

asolovyov08:01:14

ohhhhhhhhhhhhhhh

asolovyov08:01:18

that could be!

asolovyov08:01:23

that's a longer task

lucasbradstreet08:01:51

I think pending-timeout is 180 seconds by default, which is pretty long for most use cases but maybe this is a longer one

lucasbradstreet08:01:08

reducing onyx/max-pending can help a lot too

lucasbradstreet08:01:29

try reducing it to something really low, and see if things process successfully (possibly slowly) and then start to increase it again

oh, not that long

I hope 😄

max-pending, ok

thanks, will try it right now 🙂

lucasbradstreet08:01:59

when you have a big max-pending, things will queue up so you can end up with some segments taking longer than pending-timeout even though they shouldn’t take all that long

asolovyov08:01:57

yeah

asolovyov08:01:01

reduced max-pending to 2

asolovyov08:01:06

didn't help yet...

asolovyov08:01:15

maybe I should raise pending-timeout?

lucasbradstreet08:01:41

Sure, try raising pending-timeout. Also check your logs to see if you’re seeing any exceptions

asolovyov08:01:32

I'm not 😞

asolovyov08:01:38

checking timeout right now

rc114008:01:17

Hi all , getting this error when calling shutdown-peer-group

rc114008:01:20

java.nio.file.NoSuchFileException: /dev/shm/aeron-rc11400 file: "/dev/shm/aeron-rc11400" clojure.lang.ExceptionInfo: Error in component :messaging-group in system onyx.system.OnyxPeerGroup calling #'com.stuartsierra.component/stop component: #<Aeron Peer Group> function: #'com.stuartsierra.component/stop reason: :com.stuartsierra.component/component-function-threw-exception system: #<Onyx Peer Group> system-key: :messaging-group

rc114008:01:43

the shutdown-peer-group is called when mount is stopping the application when its being closed

rc114008:01:30

any idea what that means (i get that no file exists) but why does it only do happen some of the times (when not stopping from a repl)

asolovyov10:01:52

@lucasbradstreet is there a way to determine why it's retrying? 🙂 because when I run it locally, I see the time difference between reads from kafka is a minute

asolovyov10:01:01

like that:

17-01-20 10:00:49 alcor.mk.corp DEBUG [raker.common.logging:16] - :read-mk-campaign logging segment: {:offset 624,
17-01-20 10:00:49 alcor.mk.corp DEBUG [raker.common.logging:16] - :read-mk-campaign logging segment: {:offset 625,
17-01-20 10:01:49 alcor.mk.corp DEBUG [raker.common.logging:16] - :read-mk-campaign logging segment: {:offset 625,
17-01-20 10:01:49 alcor.mk.corp DEBUG [raker.common.logging:16] - :read-mk-campaign logging segment: {:offset 624,

asolovyov10:01:28

@lucasbradstreet thanks a lot, in the end it was pending timeout - it's 60 seconds by default and turns out my task takes from 3 to 4 minutes :))

lucasbradstreet12:01:38

@asolovyov: great. Glad to hear it's something standard. You might find complete latency a useful metric to track in addition to retries. Complete latency will tell you how long a message took to ack end to end.

asolovyov14:01:47

I'm tracking it, it's just when retry happens, complete latency is not sent to riemann

mariusz_jachimowicz15:01:08

Please review and merge new dashboard UI PR- https://github.com/onyx-platform/onyx-dashboard/pull/79

michaeldrogalis16:01:29

Thanks @mariusz_jachimowicz. I’ll try to get a look at it this weekend. Much appreciated.

2017-01-20

Channels