jvm

RAJKUMAR 2023-09-01T22:05:38.166189Z

I've question related to java threads

RAJKUMAR 2023-09-01T22:07:49.428769Z

We are using the library again https://github.com/liwp/again/tree/master for retry purposes

RAJKUMAR 2023-09-01T22:08:13.280649Z

This library internally uses Thread/sleep for delay purposes https://github.com/liwp/again/blob/master/src/again/core.clj#L90-L92

RAJKUMAR 2023-09-01T22:13:50.543389Z

The problem is facing is when Thread/sleep is invoked then the thread is not interrupted even it slept for enough delay time

RAJKUMAR 2023-09-01T22:14:36.100389Z

RAJKUMAR 2023-09-01T22:15:04.196009Z

I took thread dump with the span of 2 minutes

RAJKUMAR 2023-09-01T22:15:18.836849Z

around 5 thread dump

RAJKUMAR 2023-09-01T22:15:35.283099Z

In all those the same thread is always sleeping

RAJKUMAR 2023-09-01T22:16:07.050519Z

what is the best solution in this case?

2023-09-01T22:16:34.248299Z

that is retrying

2023-09-01T22:16:43.204299Z

sleep -> try and fail -> sleep in a loop

2023-09-01T22:18:09.108559Z

depending on how fast the fails happen, how long the sleeps are, etc it is unlikely that at any point when you thread dump you will get a dump not sleeping

2023-09-01T22:40:27.429419Z

for example with this program

(defn work []
  (Thread/sleep 10))

(defn backoff-and-retry []
  (Thread/sleep 10000))

(loop []
  (if (zero? (rand-int 5))
    (do
      (work)
      (recur))
    (do
      (backoff-and-retry)
      (recur))))
The thread dump will pretty much always contain the back-of-and-retry function, and not the work function, because of the ratio of the amount of time it spends executing each

RAJKUMAR 2023-09-01T22:42:58.715209Z

my concern is even after couple of hours it is sleeping

2023-09-01T22:43:14.682339Z

what makes you think it is the same sleep?

RAJKUMAR 2023-09-01T22:44:37.198209Z

this is the retry-startegy

RAJKUMAR 2023-09-01T22:44:40.042859Z

{:initial-retry-count            2
  :initial-delay-ms               10
  :exponential-backoff-multiplier 2.0
  :max-delay-ms                   1000}

2023-09-01T22:45:23.463449Z

ok, but is that the only retry? what is the behavior of the code that is driving this code

RAJKUMAR 2023-09-01T22:45:44.576769Z

the operation is it is updating HSET into redis

2023-09-01T22:46:08.031489Z

sure, but this code with the retry around it doesn't exist in isolation right?

2023-09-01T22:46:14.117549Z

something else is calling it

2023-09-01T22:46:28.082099Z

like as a result of webrequests?

2023-09-01T22:46:39.482299Z

how often are web requests coming in?

2023-09-01T22:47:03.779849Z

are those being served on a threadpool that is re-using the same thread? is the hset always failing quickly?

RAJKUMAR 2023-09-01T22:47:04.332969Z

req/s are very few in STG environment

RAJKUMAR 2023-09-01T22:47:31.545169Z

like 1k req/hr

2023-09-01T22:47:40.512479Z

that would be enough

RAJKUMAR 2023-09-01T22:47:54.280079Z

then it is idle for reminder 23 hours of the day

2023-09-01T22:49:39.467459Z

there could be some bug in the again library, where it sleeps too long I guess

2023-09-01T22:50:54.561819Z

but Thread/sleep itself is pretty solid, it is used day in and day out by basically every non-trivial jvm program in existence

2023-09-01T22:52:50.814099Z

"initial-retry-count" as a string doesn't seem to appear in the library, maybe check to make sure your configuration is correct

RAJKUMAR 2023-09-01T22:53:56.819709Z

okay

RAJKUMAR 2023-09-01T22:54:34.185039Z

I think it is same sleep because some of the messages from the kinesis stream are not processed

RAJKUMAR 2023-09-01T22:54:53.481579Z

and all the threads are idle in top -H -p $pid command

2023-09-01T22:54:57.526599Z

that would also be true if it was looping around a sleep

2023-09-01T22:55:15.026139Z

and infinite try fail retry loop

RAJKUMAR 2023-09-01T22:55:20.225569Z

yeah

RAJKUMAR 2023-09-01T22:55:56.536949Z

one more question

RAJKUMAR 2023-09-01T22:56:17.436039Z

what if I used async/timeout instead of sleep

RAJKUMAR 2023-09-01T22:56:28.114689Z

(defn- sleep [delay]
  (clojure.core.async/timeout delay))

2023-09-01T22:56:46.306559Z

timeout itself doesn't do anything

2023-09-01T22:56:55.862999Z

it returns a channel that will be closed after some delay

RAJKUMAR 2023-09-01T22:57:12.911229Z

ohh okay

2023-09-01T22:57:16.633069Z

and you can block on the channel waiting for it to close

2023-09-01T22:59:02.388109Z

https://github.com/clojure/core.async/blob/master/src/main/clojure/clojure/core/async/impl/timers.clj#L43 is the loop that services timeout channels, it is built on top of a delayqueue

👍 1
2023-09-01T23:02:04.039539Z

the reason core.async has timeout channels is because 1. Thread/sleep blocks a thread, so you shouldn't use it in go blocks and 2. you can use timeout channels in things like alts which you cannot do with Thread/sleep