Is this the right place to ask questions about Dirigiste as well? 🙂
I was surprised to find out that Dirigiste seems to silently drop pool acquisition requests when the controller’s maxTotalObject is reached.
Oof nice find.
Unfortunately I found it the hard way, chasing Heisenbugs that our app sometimes kept hanging 😄
We use Dirigiste for a pool of git cat-file --batch-command processes to read the Git database of several repositories, keyed by the directory. When we reached max-total, any read requests to “new” Git repos never resolved. As a workaround we increased max-total now to a much larger number
I can imagine that - had similarly painful encounters with core.async in the past 😅
Yeah, those are fun, too 🙂
Hours of debugging, then a one-line “fix” 🙂
concurrency is where the bugs hide
FWIW there's also #manifold where this particular bug is probably more apropos - maybe you want to crosspost it there
Thanks, will do! I guessed #aleph because Dirigiste uses io.aleph.* Java packages
Yeah this seems to be a bit of a historic relic of how the codebases evolved 😅
For reference: #manifold https://clojurians.slack.com/archives/C02H9GF74CF/p1739526647128589
Silently though?
Wait, I'm confused. The controller has to do with how many created objects a pool can retain, it shouldn't have to do with acquisition queue limits.
Well, the controllers have maxObjectPerKey and maxTotalObjects
I’m using a small wrapper around dirigiste’s Java API to return Promesa promises (= CompletableFuture). Here is an example:
(let [pool (pool/pool (pool/generator (fn [k]
(let [id (random-uuid)]
(tap> [:created k id])
id))
(fn [k id]
(tap> [:destroyed k id])))
(pool/utilization-controller 0.9 1 2)
:control-period-ms 100)
x (pool/acquire pool :x)
y (pool/acquire pool :y)
z (pool/acquire pool :z)]
(pool/release pool :x (deref x 100 nil))
(pool/release pool :y (deref y 100 nil))
(deref z 2000 nil)) ;; => nil
The pool’s controller is configured with 1 max per-key and 2 max total. The default sample rate is 25ms, I lowered the control period to 100ms.
I can acquire x and y. z remains unfulfilled, waiting for the IPool$AcquireCallback to be invoked.
Then I release x and y and expected that since the pool can now shrink to 0, my acquisition of z will eventually succeed.
But it doesn’t. z will remain unfulfilled forever.Silently: I expected at least a RejectedExecutionException if the acquire fails. Yet it just succeeds, without ever invoking the callback.
Remind me, what's pool/ ?
Is it an Aleph or Manifold namespace?
No, it is the small wrapper library I wrote, similar to aleph.flow I guess.
pool/acquire is just this:
(defn acquire
"Acquires an object from the pool for key `k`, returning a promise containing the object. May
throw a `java.util.concurrent.RejectedExecutionException` if there are too many pending acquires."
[^IPool pool k]
(let [p (p/deferred)]
(try
(.acquire pool k
(reify IPool$AcquireCallback
(handleObject [_ obj]
(when-not (p/resolve! p obj)
;; Cancelled or already completed.
(.release pool k obj)))))
(catch Throwable e
(p/reject! p e)))
p))The example above could be written with aleph.flow or using the Java API directly, but this is the code I had at hand.
So, it is not quietly discarded. The acquisition request for :z is sitting in the queue after the :x and :y objects are destroyed.
The problem seems to be that the controller miscalculates the utilization of :z key/queue, so it decides to never grow the pool upward to fulfill that request
This seems to work:
(let [pool @(def -pool (io.aleph.dirigiste.Pools/utilizationPool
(reify io.aleph.dirigiste.IPool$Generator
(generate [_ k]
(let [id (rand-int 10000000)]
(println :created k id)
id))
(destroy [_ k id]
(println :destroyed k id)))
0.9 1 2))
x (acquire pool :x)
y (acquire pool :y)
z (acquire pool :z)]
(println "released :x" (.release pool :x (deref x 100 nil)))
(println "released :y" (.release pool :y (deref y 100 nil)))
(Thread/sleep 5000)
(let [z2 (acquire pool :z) ;; To poke the pool
og-z-val (deref z 2000 nil)]
(println "The original :z take" og-z-val)
(.release pool :z og-z-val)
(println "Got another from :z" @z2))
)
So, looks like you can't really rely on the controller to backfill your pending takes. This is probably a bug
Speaking from experience, Aleph's fully asynchronous model never worked well for me. Eventually, I gave up on the pool being asynchronous. I only acquire objects (connections) synchronously – if there is an object in the pool, I take it; if there is space in the pool, I create a new object for myself; otherwise, throw. I also don't try to frugally limit the connection pool sizes, too many connections was never an issue for me (~16k connections per target host is fine).
Your example works, but is of course not what I need. You deliberately wait after releasing until the controller had a chance to make space. You can't do that in practice. Our problem was that we had hard to debug deadlocks because we expected acquires to eventually succeed, or fail perceivably.
I agree this is at the very least confusing; PR is welcome.
I can’t promise that I can provide that PR, will need to dig deeper into how the pool is implemented 🙂