clojure 2024-06-26 | Slack Archive

Omar07:06:45

Anyone have any experience eliminating downtime restarting a newly deployed uberjar? Typically I have a systemd process that will restart my webapps when I deploy a new backend, and can take a few seconds to restart. My first thought is that this could be done by starting the newly deployed backend with the old one still running, then doing a "handover" before killing off the old process, possibly handing over any transient state that might be useful.

magnars07:06:35

Yes, this is called blue-green deployment. Place a load balancer like haproxy in front, and you can direct traffic to the app server that is ready. The trick with transient state handoffs is to not have transient state.

igrishaev07:06:18

This is also a known practice in Kubernetis. You deploy a new container, and when it's up, and the healthcheck endpoint signals it's alright, the system switches traffic from the old container to the new one, and the old one gets shut down.

narendraj907:06:07

For a standalone project that deploys containers, have a look at this project: https://kamal-deploy.org/ Popular in the Ruby on Rails community but could be used to deploy any docker based webapp.

👀 1

vemv08:06:37

If you don't have a container-based setup, something like aws Elastic Beanstalk can give one blue-green deployments It kinda gives me good memories 😄

didibus17:06:16

I think for a simple single node server, you could reverse proxy, start the newly deployed app on a different port, and then swap the proxy port. Then shutdown the first server. Gotta write some scripts for it. But I think that can work.

Omar15:07:35

Ended up using this without adding anything else and it's pretty effective.

(defn rebind-port!
  "Attempts to kill a current port-binding process, then repeatedly executes
  nullary bind-f (which must return logical true on successful binding).
  Returns `bind-f`'s result on success, or throws.

  *nix only. This idea courtesy of Feng Shen, Ref. ."
  [port bind-f & [{:keys [nmax-attempts throttle-ms]
                   :or   {nmax-attempts 50
                          throttle-ms   150}}]]
  (let [binder-pid (str/trim (:out (shell/sh "lsof" "-t" "-sTCP:LISTEN"
                                             (str "-i:" port))))]
    (when-not (str/blank? binder-pid)
      (log/infof "Attempting to kill process %s to free port %s"
                    binder-pid port)
      (let [kill-resp (shell/sh "kill" binder-pid)]
        (when-not (zero? (:exit kill-resp))
          (throw
           (Exception. (format "Failed to kill process %s to free port %s: %s"
                               binder-pid port (:err kill-resp)))))))
    (loop [nattempt 1]
      (when (> nattempt nmax-attempts)
        (throw
         (Exception. (format "Failed to bind to port %s within %s attempt(s)"
                             port nmax-attempts))))

      (if-let [result (try (bind-f) (catch java.net.BindException _))]
        (do (log/debugf "Bound to port %s after %s attempt(s)" port nattempt)
            result)
        (do (Thread/sleep throttle-ms)
            (recur (inc nattempt)))))))

Mark Wardle11:07:49

I use caddy for this when I only have a simple server and want zero downtime. Very easy to use.

andrea.crotti12:06:25

is it possible to detect if something is called inside a with-open macro somehow? It's a macro so probably not dynamically, but maybe I can do static analysis instead

tatut12:06:48

with clj-kondo analysis data?

tatut12:06:17

as it expands to a try/finally, I don’t think there is a way to detect a specific finally handler being present

andrea.crotti12:06:00

yeah, probably clj-kondo or similar could do the trick though

andrea.crotti16:06:47

analysis> (p/parse-string-all
   "(postgres/connection)")
{:children
 ({:tag :list,
   :format-string "(%s)",
   :wrap-length 2,
   :seq-fn
   #function[clj-kondo.impl.rewrite-clj.node.seq/list-node/fn--122690],
   :children
   ({:value postgres/connection,
     :string-value "postgres/connection"})})}
analysis> (p/parse-string-all
   "(with-open [conn (postgres/connection)] (println 1))")
{:children
 ({:tag :list,
   :format-string "(%s)",
   :wrap-length 2,
   :seq-fn
   #function[clj-kondo.impl.rewrite-clj.node.seq/list-node/fn--122690],
   :children
   ({:value with-open, :string-value "with-open"}
    {:tag :vector,
     :format-string "[%s]",
     :wrap-length 2,
     :seq-fn #function[clojure.core/vec],
     :children
     ({:value conn, :string-value "conn"}
      {:tag :list,
       :format-string "(%s)",
       :wrap-length 2,
       :seq-fn
       #function[clj-kondo.impl.rewrite-clj.node.seq/list-node/fn--122690],
       :children
       ({:value postgres/connection,
         :string-value "postgres/connection"})})}
    {:tag :list,
     :format-string "(%s)",
     :wrap-length 2,
     :seq-fn
     #function[clj-kondo.impl.rewrite-clj.node.seq/list-node/fn--122690],
     :children
     ({:value println, :string-value "println"}
      {:value 1, :string-value "1"})})})}

andrea.crotti16:06:33

so that works, I just need to somehow check from that nested data structure that when we have a postgres/connection in one of the values, we also have the with-open just above

andrea.crotti16:06:17

only problem maybe would be if postgres is required as something else, and clj-kondo could not possibly understand it's the same thing I suppose

andrea.crotti16:06:28

unless I also analyze the requires maybe

shaunlebron17:06:55

High-performance clojure question— We have a https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/util/concurrent/Executors.html#newFixedThreadPool(int) which we use to execute multiple tasks in response to events as they come in. The speed at which we can respond to an event (using our thread pool) actually slows down if incoming events are less frequent. I’m wondering why this is happening because the type of thread pool we’re using is not killing threads but is always keeping them alive for reuse (according to the docs). Do we have to keep threads “hot” somehow in between events? We actually ran a test to have the thread always do dummy work, and our response latency improved. What is going on.

shaunlebron17:06:32

(This is preventing us from sharding our work, because sharding decreases the frequency of events on each machine, thereby slowing down our response time as described above.)

phronmophobic17:06:36

You can use the threadFactory arity to double check that threads aren't being created/destroyed for some reason, https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/util/concurrent/Executors.html#newFixedThreadPool(int,java.util.concurrent.ThreadFactory). Are you testing response time by sending an http request? Can you test response time by just submitting tasks in process? My hunch is that there's some warmup associated with the server receiving requests rather than processing them.

✅ 1

phronmophobic17:06:47

The other thing I would try is to use https://github.com/clojure-goes-fast/clj-async-profiler on hot/cold events and compare their flamegraphs.

phronmophobic17:06:58

here's the boilerplate for thread factories:

(reify
  java.util.concurrent.ThreadFactory
  (newThread [this r]
    (let [thread (.newThread (Executors/defaultThreadFactory)
                             r)]
      ;; log something?
      thread)))

shaunlebron17:06:36

we measure the response time after the event is received and is submitted for processing, and yes our tests are submitted in process

👍 1

phronmophobic17:06:04

Have you measured the latency between submitting the work and starting the work or are you just measuring the response time from initial submission to completion?

👀 2

shaunlebron17:06:17

generally we didn’t find flamegraphs helpful for debugging thread performance

shaunlebron17:06:33

yeah I suppose it would also be good to know if the total time spent inside the threads is also increasing to see if its just hot path related and not thread overhead

phronmophobic17:06:46

yea, there may be some warmup associated with the particular tasks being enqueued

shaunlebron17:06:51

thanks I will try these

valerauko18:06:08

flamegraphs would be so much much more useful for async/multithreaded code if the cpu executing it would be obvious on the graph

phronmophobic18:06:52

The profiler does let you split up stacktraces by thread. It's not too had to create a flamegraph from the raw output. It's probably also possible to use transforms to get the builtin flamegraph generator to split by thread too.

☝️ 1

valerauko18:06:24

there's your Library Idea of the Day. you get a github star from me if you do it 💪

vemv18:06:47

Yourkit has a nice Threads view, shows you a timeline for each thread, and its state and stacktrace for a given moment in time

oyakushev21:06:18

Plain old VisualVM also has a thread view, you can try that

oyakushev21:06:47

What exactly do you mean by "slowing down"? Is it 100us extra per request? 500us? Milliseconds?

shaunlebron21:06:00

It slowed down from 6ms to 11ms

oyakushev21:06:32

> The profiler does let you split up stacktraces by thread. Yes, by passing :threads true. But I agree, flamegraphs are not the most convenient way to investigate these kinds of problems

oyakushev21:06:41

Like others have said, I'll start by checking if the threads are actually reused

oyakushev21:06:46

I've been using custom threadpools most of the time, so I'm not 100% sure about the behavior of the default JDK ones.

oyakushev21:06:35

> If any thread terminates due to a failure during execution prior to shutdown, a new one will take its place if needed to execute subsequent tasks.

oyakushev21:06:52

Maybe something kills your pooled threads so they are recreated

oyakushev21:06:39

Just doing jstack <PID> a few times and checking thread names should confirm/reject the hypothesis. If the numbers in the thread names grow, then this is it

👀 1

shaunlebron22:06:56

Confirming that our thread pool wasn’t creating multiple threads, using the stubbed thread factory

oyakushev22:06:49

Which N are we talking about, just to be clear?

oyakushev22:06:48

> It slowed down from 6ms to 11ms Is this a max or median or mean?

oyakushev22:06:31

I would also suggest adding https://github.com/clojure-goes-fast/jvm-hiccup-meter to your telemetry

shaunlebron22:06:39

6 threads

oyakushev22:06:39

I'd say that 5 ms here or there is pretty much noise if it's one-off spikes, not the average raise over a large sample.

oyakushev22:06:12

There are so many places where this 5ms could be squandered

oyakushev22:06:55

Another suggestion is employing https://github.com/clj-commons/dirigiste for thread pools because of its extensive observability. But you'd have to manually extract that info and forward it to your monitoring solution.

shaunlebron22:06:08

Processing 208 events on 6 threads: 1. With no pause between events:

PROCESSING TIMES (ms):
min: 0.87 | median: 4.39, mean: 4.51, sd: 0.94 | max: 10.10

2. With 25ms pause between events:

PROCESSING TIMES (ms):
min: 4.00 | median: 12.32, mean: 12.43, sd: 2.95 | max: 34.72

shaunlebron22:06:33

the test above shows that pausing 25ms between events is somehow slowing the threadpool work by ~8ms

➕ 1

oyakushev22:06:32

Covering the thread pool with metrics would be my next step to get to the bottom of this. Cumbersome, but it can shed some more light on the issue.

➕ 1

shaunlebron20:06:23

I was able to reduce the slowing-down effect by just running a dummy function that increments an atom on the main thread. Did we discover a low-level effect due to CPU context-switching or energy-saving?

phronmophobic20:06:12

If the slowdown effect is 4x, it seems like you should be able to measure if the extra time is spent: • enqueuing the data • waiting to start the task • running the task • waiting to return the task result • a general slowdown across all steps • something else Each of these might have different causes and remedies.

phronmophobic20:06:16

It's also not clear what kind of environment you're testing. Does the slowdown manifest in dev, staging, CI, production? Is it in a docker container, bare metal, cloud VM?

shaunlebron21:06:49

mac m2 in production, but reproduced locally on a mac m1

phronmophobic21:06:53

oh interesting.

shaunlebron21:06:18

I’m hoping that the specifics of where the slowdown is occurring would be irrelevant if the fix is to prevent the CPU from marking the JVM process as idle (assuming some context-switching or energy-saving mode is the root of the problem)

phronmophobic21:06:45

maybe, although it could be a good indicator if that is actually the problem. seems not so hard to measure the slow down: • submit a job and pass (System/nanoTime) • the job returns [start-time (System/nanoTime)] • the consumer compares start and end times with (System/nanoTime)

👍 2

shaunlebron19:06:00

For the record, this is working really well to fix our issue:

(defn run-test-report-with-cpu-ping! []
  (let [pool ^ScheduledExecutorService (Executors/newScheduledThreadPool 1)
        ping-fn (fn [])] ;; no-op: ping does nothing
    (.scheduleAtFixedRate pool ping-fn 0 10 TimeUnit/MICROSECONDS)
    (run-test-report!)
    (.shutdown pool)))

shaunlebron21:06:30

so, scheduling a no-op function to run every 10 microseconds fixes our issue

jumar05:06:49

I think it would be really beneficial if you could just perform some basic measurements as @U7RJTCH6J suggested Otherwise you are just running in circles and we aren’t gaining any real understanding of the problem

shaunlebron21:07:12

Across all the recorded slowdowns, I found that the time spent waiting for threads to execute remains a constant fraction of the total time in each case. The only other time is spent in our complex event handler which in total is experiencing a slowdown in proportion to that of the thread scheduler. This seems to imply that the whole process is slowing down. I’ve spread metrics across parts of the system before and came up empty. We also ran the same tests on linux-on-intel, and 100ms pauses between events only created a 1ms slowdown, instead of the 10ms+ slowdowns we were seeing on apple silicon. We also tried increasing the latency and throughput tiers of our jvm process with taskpolicy, which had no effect. And tried shutting down the efficiency cores to force macOS to use performance cores only (in case our process was being relegated during the pauses somehow), but the command for doing this cpuctl offline seems to only work on intel chips not silicon. In conclusion, something mysterious is happening on apple silicon when our process idles for longer than 150 microseconds (probably related to energy-saving policy of macOS on silicon), so scheduling a no-op function to run that frequently seems to prevent it.

shaunlebron16:07:10

Also, this slowdown was not related to threading, since we disabled it and saw the same effects.

2024-06-26

Channels