datalevin

Hukka 2026-03-19T12:01:35.976489Z

I'm doing something that makes datalevin spawn an infinite amount of threads, until it runs out of memory. No idea what yet, as this happens in the jetty based backend when it's been up a while and nothing I do in normal use makes the thread count go up. Just a heads up if there are hints on what to look for

"clojure-agent-send-off-pool-8787" #8827 [8827] prio=5 os_prio=0 cpu=0.30ms elapsed=80073.79s tid=0x00007f0b42f7a4a0 nid=8827 waiting on condition  [0x00007f0852afe000]
   java.lang.Thread.State: WAITING (parking)
        at jdk.internal.misc.Unsafe.park(java.base@21.0.10/Native Method)
        - parking to wait for  <0x000000008695a9f8> (a java.util.concurrent.CountDownLatch$Sync)
        at java.util.concurrent.locks.LockSupport.park(java.base@21.0.10/Unknown Source)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(java.base@21.0.10/Unknown Source)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(java.base@21.0.10/Unknown Source)
        at java.util.concurrent.CountDownLatch.await(java.base@21.0.10/Unknown Source)
        at clojure.core$promise$reify__8621.deref(core.clj:7257)
        at clojure.core$deref.invokeStatic(core.clj:2337)
        at clojure.core$deref.invoke(core.clj:2323)
        at datalevin.async$handle_result.invokeStatic(async.clj:41)
        at datalevin.async$handle_result.invoke(async.clj:39)
        at datalevin.async.AsyncExecutor$fn__9887.invoke(async.clj:131)
        at clojure.core$binding_conveyor_fn$fn__5842.invoke(core.clj:2047)
        at clojure.lang.AFn.call(AFn.java:18)
        at java.util.concurrent.FutureTask.run(java.base@21.0.10/Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@21.0.10/Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@21.0.10/Unknown Source)
        at java.lang.Thread.runWith(java.base@21.0.10/Unknown Source)
        at java.lang.Thread.run(java.base@21.0.10/Unknown Source)

2026-03-19T12:06:00.929989Z

is the code public?

Hukka 2026-03-19T12:11:26.792439Z

I'm afraid not. I suppoes the first step would be to update datalevin, as the async.clj has changed a lot since .9.22

Hukka 2026-03-19T12:13:38.839109Z

It's odd that the thread dump shows that the threads have been spawning about every 4–6 seconds, but I don't see it at all in somewhat freshly started systems

Huahai 2026-03-19T14:52:52.267679Z

It's an old version of Datalevin, where every async request created a future that parked a promise. If you are creating requests too fast, it can OOM. That has been changed in 0.10.x

Huahai 2026-03-19T15:15:56.710709Z

even with the current version, there's still chance of OOM if you enqueue requests too fast without draining the results. To avoid this kind of problem, I can add a semaphore, so the requests can be blocked if there are too many. However, that is just a different failure mode on user's side. The root cause needs to be fixed, which is the requests are coming too fast without draining the results. You need to deref the futures.

Huahai 2026-03-19T15:16:58.125099Z

If the requests are coming from outside world, you will need to have ways to mitigate DOS attack. Otherwise you will have failures one way or another.

Hukka 2026-03-19T18:27:42.526019Z

The threads are accumulating over a really long time, and the timing seems to match the health check every 5 seconds. But the health check is TCP only, and doesn't even make a proper HTTP request. None of the endpoints leak threads when I try to test them, and I don't see how the results could not be read. I'll see if this keeps happening with latest version, and then add logging everywhere I'm touching datalevin to make it possible to see what part is leaking.

Huahai 2026-03-19T19:25:52.840429Z

master branch has semaphore cap on the async backlog.