aleph

2025-02-13T16:03:02.551839Z

Is this the right place to ask questions about Dirigiste as well? 🙂 I was surprised to find out that Dirigiste seems to silently drop pool acquisition requests when the controller’s maxTotalObject is reached.

dergutemoritz 2025-02-14T09:32:47.743559Z

Oof nice find.

2025-02-14T09:33:26.888129Z

Unfortunately I found it the hard way, chasing Heisenbugs that our app sometimes kept hanging 😄

2025-02-14T09:35:52.424169Z

We use Dirigiste for a pool of git cat-file --batch-command processes to read the Git database of several repositories, keyed by the directory. When we reached max-total, any read requests to “new” Git repos never resolved. As a workaround we increased max-total now to a much larger number

dergutemoritz 2025-02-14T09:35:53.470359Z

I can imagine that - had similarly painful encounters with core.async in the past 😅

2025-02-14T09:36:05.604119Z

Yeah, those are fun, too 🙂

2025-02-14T09:36:15.524989Z

Hours of debugging, then a one-line “fix” 🙂

dergutemoritz 2025-02-14T09:36:17.996259Z

concurrency is where the bugs hide

🐞 1
dergutemoritz 2025-02-14T09:38:03.016099Z

FWIW there's also #manifold where this particular bug is probably more apropos - maybe you want to crosspost it there

2025-02-14T09:38:47.201319Z

Thanks, will do! I guessed #aleph because Dirigiste uses io.aleph.* Java packages

dergutemoritz 2025-02-14T09:47:06.039229Z

Yeah this seems to be a bit of a historic relic of how the codebases evolved 😅

👍 1
2025-02-14T09:55:30.622349Z

For reference: #manifold https://clojurians.slack.com/archives/C02H9GF74CF/p1739526647128589

👍 1
oyakushev 2025-02-13T16:25:18.530879Z

Silently though?

oyakushev 2025-02-13T16:28:26.484179Z

Wait, I'm confused. The controller has to do with how many created objects a pool can retain, it shouldn't have to do with acquisition queue limits.

2025-02-13T16:29:59.437759Z

Well, the controllers have maxObjectPerKey and maxTotalObjects

2025-02-13T16:33:33.556189Z

I’m using a small wrapper around dirigiste’s Java API to return Promesa promises (= CompletableFuture). Here is an example:

(let [pool (pool/pool (pool/generator (fn [k]
                                        (let [id (random-uuid)]
                                          (tap> [:created k id])
                                          id))
                                      (fn [k id]
                                        (tap> [:destroyed k id])))
                      (pool/utilization-controller 0.9 1 2)
                      :control-period-ms 100)
      x (pool/acquire pool :x)
      y (pool/acquire pool :y)
      z (pool/acquire pool :z)]
  (pool/release pool :x (deref x 100 nil))
  (pool/release pool :y (deref y 100 nil))
  (deref z 2000 nil)) ;; => nil
The pool’s controller is configured with 1 max per-key and 2 max total. The default sample rate is 25ms, I lowered the control period to 100ms. I can acquire x and y. z remains unfulfilled, waiting for the IPool$AcquireCallback to be invoked. Then I release x and y and expected that since the pool can now shrink to 0, my acquisition of z will eventually succeed. But it doesn’t. z will remain unfulfilled forever.

2025-02-13T16:35:37.748349Z

Silently: I expected at least a RejectedExecutionException if the acquire fails. Yet it just succeeds, without ever invoking the callback.

oyakushev 2025-02-13T16:38:05.944809Z

Remind me, what's pool/ ?

oyakushev 2025-02-13T16:38:36.742079Z

Is it an Aleph or Manifold namespace?

2025-02-13T16:53:57.917189Z

No, it is the small wrapper library I wrote, similar to aleph.flow I guess.

2025-02-13T16:54:41.883309Z

pool/acquire is just this:

(defn acquire
  "Acquires an object from the pool for key `k`, returning a promise containing the object.  May
   throw a `java.util.concurrent.RejectedExecutionException` if there are too many pending acquires."
  [^IPool pool k]
  (let [p (p/deferred)]
    (try
      (.acquire pool k
                (reify IPool$AcquireCallback
                  (handleObject [_ obj]
                    (when-not (p/resolve! p obj)
                      ;; Cancelled or already completed.
                      (.release pool k obj)))))
      (catch Throwable e
        (p/reject! p e)))
    p))

2025-02-13T16:55:32.857159Z

The example above could be written with aleph.flow or using the Java API directly, but this is the code I had at hand.

oyakushev 2025-02-13T17:25:20.925099Z

So, it is not quietly discarded. The acquisition request for :z is sitting in the queue after the :x and :y objects are destroyed.

oyakushev 2025-02-13T17:26:04.964879Z

The problem seems to be that the controller miscalculates the utilization of :z key/queue, so it decides to never grow the pool upward to fulfill that request

oyakushev 2025-02-13T17:31:46.673179Z

This seems to work:

(let [pool @(def -pool (io.aleph.dirigiste.Pools/utilizationPool
              (reify io.aleph.dirigiste.IPool$Generator
                (generate [_ k]
                  (let [id (rand-int 10000000)]
                    (println :created k id)
                    id))
                (destroy [_ k id]
                  (println :destroyed k id)))
              0.9 1 2))
      x (acquire pool :x)
      y (acquire pool :y)
      z (acquire pool :z)]
  (println "released :x" (.release pool :x (deref x 100 nil)))
  (println "released :y" (.release pool :y (deref y 100 nil)))
  (Thread/sleep 5000)
  (let [z2 (acquire pool :z) ;; To poke the pool
        og-z-val  (deref z 2000 nil)]
    (println "The original :z take" og-z-val)
    (.release pool :z og-z-val)
    (println "Got another from :z" @z2))
  )

oyakushev 2025-02-13T17:32:19.554249Z

So, looks like you can't really rely on the controller to backfill your pending takes. This is probably a bug

oyakushev 2025-02-13T17:40:16.867649Z

Speaking from experience, Aleph's fully asynchronous model never worked well for me. Eventually, I gave up on the pool being asynchronous. I only acquire objects (connections) synchronously – if there is an object in the pool, I take it; if there is space in the pool, I create a new object for myself; otherwise, throw. I also don't try to frugally limit the connection pool sizes, too many connections was never an issue for me (~16k connections per target host is fine).

2025-02-13T18:18:21.578199Z

Your example works, but is of course not what I need. You deliberately wait after releasing until the controller had a chance to make space. You can't do that in practice. Our problem was that we had hard to debug deadlocks because we expected acquires to eventually succeed, or fail perceivably.

oyakushev 2025-02-13T18:22:58.400409Z

I agree this is at the very least confusing; PR is welcome.

👍 1
2025-02-14T07:52:25.443929Z

I can’t promise that I can provide that PR, will need to dig deeper into how the pool is implemented 🙂