Seems like there may be some performance issues with the track-state middleware?
Seeing a several minute Neovim freeze with CPU pinned at 100% when loading in a large library NS. (`(use 'com.rpl.rama)` #rama)
It seems to return metadata for 600 variables, which takes 77 seconds to be fully received and then another couple minutes to fully process.
Initially I added nREPL message logging directly to enqueue_message in conjure/remove/nrepl.lua to see what was going on. (I think this probably wasn't correct, since it doesn't properly correlate messages that are part of the same stream. It does however seem to indicate that nREPL messages continue to be received over a minute after the initial eval.)
I also captured the raw nREPL stream with tcpdump, decoded with tshark, and the decoded with nrepl/bencode to make sure I saw exactly what was being sent on the socket. Strangely this didn't show any of the same var metadata I was seeing from the Lua logs. (Is it possible that happens on a different port than the primary nREPL connection?)
I also tried to do some profiling, but that ran into a vim.schedule lua callback: table overflow error after using 100+GB of memory.
This does not appear to be an issue in Calva. (It takes a few seconds to load, but there's no similar 60+ second delay.)
Let me know if there's something else I can be testing.
I also posted in #cider (https://clojurians.slack.com/archives/C0617A8PQ/p1745426905158029) because I thought it was strange that private variable metadata was being included, but that doesn't seem to be the main issue here.
This appears to be mainly due to slow bencode decoding of individual chunks. And notable each subsequent chunk before the final chunk takes a bit longer, (up to 250ms each, which becomes very slow for 700+ chunks)
Currently https://github.com/Olical/conjure/blob/1e656c64494f0b0c51baa40ea3c61f2b608e15b2/fnl/conjure/remote/transport/bencode/init.fnl#L8 is being called for every single chunk which ends up allocating 2 new strings for the entire accumulated chunk so far until it can finally decode a full message. I think it may be possible to make the bencode implementation accept a buffer table instead of a string, in which case each chunk can efficiently be appended instead of generating new strings every time. (Since LuaJIT doesn't support efficient mutable string operations) Perhaps a more incremental bencode decoding algorithm that didn't rely on receiving the full message first would be warranted instead. 🤔