datalevin

spnw 2025-04-28T15:03:23.173439Z

Possible noob question: I'm building a simple CRUD web app with Datalevin and http-kit. I'm the only user, running one query or transaction at a time, but I invariably get the MDB_READERS_FULL error after enough use. I am using the same DB connection for everything. I don't have a simple test case, but in the event I'm not just being silly, I will investigate further.

➕ 1
flakstad 2025-04-29T08:38:19.472149Z

I have experienced this exact problem, using http-kit, and was about to create a similar thread. Curious how the entity object is not GC’ed. Please share if you learn more @spnw, I will investigate further myself in the next days. A solution is to not use entity?

spnw 2025-04-29T16:46:15.625109Z

Actually I was wrong in thinking entity was the problem, because using transact! causes the same result.

spnw 2025-04-29T16:48:32.745209Z

I'd like to investigate this further, but as a stopgap I recommend using a different server (e.g. Jetty).

Huahai 2025-04-29T16:55:41.451459Z

probably due to http-kit's websocket support, it's keeping the ring-hander from being GC'ed?

spnw 2025-04-29T17:00:58.522619Z

Excuse the ignorant question, but are we always reliant on the GC to avoid running out of readers? E.g. there's no way to explicitly free them?

Huahai 2025-04-29T17:08:06.415799Z

We could add a function to do that, if that's something people want.

spnw 2025-04-29T17:14:17.109039Z

Perhaps it's only necessary in pathological cases, I don't know. I'm always a bit leery of relying on GC for resource management though.

Huahai 2025-04-29T17:14:42.518389Z

Readers are kept open to be renewed and reused for performance. So the system does not close readers, except when dead readers are cleaned up every 5 minutes (configurable) when the DB file is force synced at that time. However, if there's reference to them, these readers are not dead.

spnw 2025-04-29T17:18:46.108429Z

I see. I suppose that doesn't change the situation (the readers don't need to be freed, they just need to not be tied up unnecessarily)

Huahai 2025-04-29T17:25:17.523789Z

Can you try master branch to see if the transact! problem is going away?

spnw 2025-04-29T18:10:28.176599Z

Unfortunately still happens on master. Same test as before, with this handler:

(fn [_]
  (d/transact! conn [{:db/id 1 :my/attr 42}])
  {:status 200
   :body "test\n"})

Huahai 2025-04-29T19:46:35.070709Z

It turns out that http-kit by default just starts a new (virtual?) thread for each request, instead of using a thread pool. The way to fix it, is to set a :worker-pool when you start the server, and supply a thread pool of your own, anything will work, e.g. (`Executors/newCachedThreadPool)` Even though it's an unbounded pool, the cache will limit it to a small number if your threads finish running quick enough.

🙏 1
Huahai 2025-04-29T19:47:27.836959Z

So if you want to keep using http-kit, set :worker-pool is the way to go.

Huahai 2025-04-29T19:49:26.211339Z

Notice that (Executors/newVirtualThreadPerTaskExecutor) will not work, for that's the default http-kit behavior

Huahai 2025-04-29T19:51:29.744749Z

hope this helps

Huahai 2025-04-29T20:00:43.601449Z

Maybe I can add an option :reuse-reader? false to Datalevin, for those cases when one wants to use unlimited number of threads for read. e.g. when using (Executors/newVirtualThreadPerTaskExecutor), and close the reader after each use. It will have some performance implications.

Huahai 2025-04-29T20:03:07.246529Z

Let me figure out something. Meanwhile, set a :worker-pool in http-kit is the way to go.

spnw 2025-04-29T21:04:40.305319Z

Excellent. Thank you very much for your work on this!

Huahai 2025-05-08T00:55:01.166709Z

https://github.com/juji-io/datalevin/issues/326

Huahai 2025-04-28T15:06:52.784779Z

Set a bigger :max-readers

Huahai 2025-04-28T15:08:44.270279Z

Which version are you using?

spnw 2025-04-28T15:11:01.832139Z

I'm using 0.9.22.

Huahai 2025-04-28T15:12:21.639489Z

Set a larger :max-readers then

spnw 2025-04-28T15:50:58.360689Z

Bumped to 512 and I'm getting the same issue. My app rarely does one more than read at a time, so it's all very odd.

Huahai 2025-04-28T15:51:31.871859Z

How about bump more?

Huahai 2025-04-28T15:52:56.089579Z

You can also check how many threads your app is using

Huahai 2025-04-28T15:53:14.671539Z

My guess it is quite a lot

spnw 2025-04-28T16:03:36.046049Z

36 threads (include REPLs and stuff) Haven't noticed a correlation between the :max-readers count and the amount of time it takes to trigger the issue, even with a very low count. I can experiment with higher values just to see if anything changes.

Huahai 2025-04-28T16:04:34.444969Z

What's the spec of you machine?

spnw 2025-04-28T16:07:22.067199Z

6-core i5 w/ 16 GB RAM. Running macOS if that makes any difference.

Huahai 2025-04-28T16:09:39.371139Z

can you clone datalevin and see if you can pass all the tests?

Huahai 2025-04-28T16:09:43.832269Z

lein test

spnw 2025-04-28T16:17:10.281729Z

All tests are passing.

Huahai 2025-04-28T16:17:52.520629Z

it is likely a problem with the app then

Huahai 2025-04-28T16:18:23.846639Z

36 is not a lot of threads, but how did you check it?

spnw 2025-04-28T16:19:38.413569Z

(count (Thread/getAllStackTraces))

Huahai 2025-04-28T16:24:04.754229Z

since you are on Mac, activity monitor will just show you

spnw 2025-04-28T16:24:21.857669Z

The app is just a web server and little else. As far as I know, http-kit uses a small thread pool and doesn't allocate per-request threads at all.

spnw 2025-04-28T16:25:07.614329Z

Activity Monitor is currently showing 38.

Huahai 2025-04-28T16:25:42.691099Z

when the READERS_FULL happens?

Huahai 2025-04-28T16:27:34.574959Z

how did you increase :max-readers?

spnw 2025-04-28T16:30:08.255149Z

It's generally after I update an entity in the app and then query the DB to build the response.

spnw 2025-04-28T16:30:39.518889Z

(defonce conn
  (d/create-conn "/path/to/my/db" schema {:kv-opts {:max-readers 512}}))

Huahai 2025-04-28T16:31:18.856239Z

rights, bump to a few thousands?

spnw 2025-04-28T16:31:41.700299Z

Can do.

Huahai 2025-04-28T16:53:09.562739Z

if you can create a minimal reproducible repo, I can take a look

spnw 2025-04-28T17:05:32.565349Z

Currently running a test with :max-readers 8192 (could take a while...). If that fails or I'm able to get a minimal reproduction I'll get back to you. Thank you!

spnw 2025-04-28T17:38:20.992899Z

Indeed, the error triggered after roughly 8000 update/read cycles. This would seem to support my suspicion that each web request is grabbing a reader which is never released.

spnw 2025-04-28T17:39:39.315679Z

I'll get to work on that repo later.

Huahai 2025-04-28T19:19:29.357999Z

you probably created a connection each time

Huahai 2025-04-28T19:21:08.731309Z

get-conn would be safer. It will create a connect if one doesn't exist, and get if it does

spnw 2025-04-28T19:53:25.463269Z

My code only ever ran create-conn once; I was very careful about that. Testing with get-conn didn't change anything.

spnw 2025-04-28T19:56:50.166969Z

I've managed to reproduce the issue in a couple dozen lines of code. I got a strange result though: at the last moment I decided to try swapping in Jetty as the server, and the problem went away entirely. I don't know if that makes it a http-kit issue, or just a strange interaction between it and Datalevin.

Huahai 2025-04-28T19:58:39.597149Z

Interesting. Can you share the code?

spnw 2025-04-28T20:25:41.038459Z

Yeah sorry, just wanted to check a few other things. I am pretty sure my problem involves using entity instead of q .

Huahai 2025-04-28T20:27:12.004209Z

entity does have a reference to the DB instance

spnw 2025-04-28T20:27:31.956229Z

Yeah I thought that might be the case. I don't know why http-kit would be caching them though.

spnw 2025-04-28T20:33:44.608809Z

https://github.com/spnw/minimal.git

spnw 2025-04-28T20:34:39.552679Z

Run with clojure -M -m core. It makes a DB at /tmp/minimal-db and a server at localhost:8080. I just run this command until it starts spitting errors: while true; do curl ; done

spnw 2025-04-28T20:45:55.068209Z

Btw if you think it's a server problem and not a Datalevin problem then I'm content to drop it. You've been helpful :)

Huahai 2025-04-28T20:52:34.382029Z

yeah, probably some caching is going on with http-kit. OTOH entity creates an entity object, so it isn't really a query function. From your code, it should be GC'ed, but somehow didn't

spnw 2025-04-28T20:53:33.820299Z

That makes sense to me. Thanks again for your time!

spnw 2025-04-28T20:57:18.715669Z

(My actual code had a bunch of q calls and just one random entity call tucked away in a middleware, probably that is why I never suspected it. Hah.)