I've question related to java threads
We are using the library again https://github.com/liwp/again/tree/master for retry purposes
This library internally uses Thread/sleep for delay purposes https://github.com/liwp/again/blob/master/src/again/core.clj#L90-L92
The problem is facing is when Thread/sleep is invoked then the thread is not interrupted even it slept for enough delay time
I took thread dump with the span of 2 minutes
around 5 thread dump
In all those the same thread is always sleeping
what is the best solution in this case?
that is retrying
sleep -> try and fail -> sleep in a loop
depending on how fast the fails happen, how long the sleeps are, etc it is unlikely that at any point when you thread dump you will get a dump not sleeping
for example with this program
(defn work []
(Thread/sleep 10))
(defn backoff-and-retry []
(Thread/sleep 10000))
(loop []
(if (zero? (rand-int 5))
(do
(work)
(recur))
(do
(backoff-and-retry)
(recur))))
The thread dump will pretty much always contain the back-of-and-retry function, and not the work function, because of the ratio of the amount of time it spends executing eachmy concern is even after couple of hours it is sleeping
what makes you think it is the same sleep?
this is the retry-startegy
{:initial-retry-count 2
:initial-delay-ms 10
:exponential-backoff-multiplier 2.0
:max-delay-ms 1000}ok, but is that the only retry? what is the behavior of the code that is driving this code
the operation is it is updating HSET into redis
sure, but this code with the retry around it doesn't exist in isolation right?
something else is calling it
like as a result of webrequests?
how often are web requests coming in?
are those being served on a threadpool that is re-using the same thread? is the hset always failing quickly?
req/s are very few in STG environment
like 1k req/hr
that would be enough
then it is idle for reminder 23 hours of the day
there could be some bug in the again library, where it sleeps too long I guess
but Thread/sleep itself is pretty solid, it is used day in and day out by basically every non-trivial jvm program in existence
"initial-retry-count" as a string doesn't seem to appear in the library, maybe check to make sure your configuration is correct
okay
I think it is same sleep because some of the messages from the kinesis stream are not processed
and all the threads are idle in top -H -p $pid command
that would also be true if it was looping around a sleep
and infinite try fail retry loop
yeah
one more question
what if I used async/timeout instead of sleep
(defn- sleep [delay]
(clojure.core.async/timeout delay))timeout itself doesn't do anything
it returns a channel that will be closed after some delay
ohh okay
and you can block on the channel waiting for it to close
https://github.com/clojure/core.async/blob/master/src/main/clojure/clojure/core/async/impl/timers.clj#L43 is the loop that services timeout channels, it is built on top of a delayqueue
the reason core.async has timeout channels is because 1. Thread/sleep blocks a thread, so you shouldn't use it in go blocks and 2. you can use timeout channels in things like alts which you cannot do with Thread/sleep