lsp 2022-06-24 | Slack Archive

Starting a thread to discuss the freezes that have been reported recently…

references https://clojurians.slack.com/archives/CBE668G4R/p1656064304196529 https://github.com/clojure-lsp/clojure-lsp/issues/1058 https://github.com/clojure-lsp/clojure-lsp/issues/1080 https://github.com/clojure-lsp/clojure-lsp/pull/1079 /cc @UKFSJSM38 @U0178V2SLAY @U0123R34Q9Y

👍 1

jacob.maine18:06:08

Like I said in the Slack conversation linked above, I’ve been working on this for a few days, without much progress. And again, there’s a chance that what I’m working on is unrelated, so I don’t want to sidetrack this conversation if it isn’t. Not sure how to prevent that

ericdallo18:06:44

@U07M2C8TT I know we are not sure about the root cause, but i thought https://github.com/clojure-lsp/clojure-lsp/pull/1079 could help and we could check with @U0178V2SLAY or @U0123R34Q9Y if fixed for them

ericdallo18:06:08

IMO the issue is probably the >!!

jacob.maine18:06:22

Right, I started to work on clojure-lsp/1058, or rather, made a guess that the freezes @U0178V2SLAY was seeing were related to several blocking puts. The outcome of that is https://github.com/clojure-lsp/clojure-lsp/pull/1079

jacob.maine18:06:22

However, the integration tests for that branch fail almost all the time. (Not always, unfortunately.) I’m wondering if I’ve made the problem worse

jacob.maine18:06:57

@U0178V2SLAY and @U0123R34Q9Y, if you’d like you can check out that branch and see if it fixes any of your issues (or makes them worse)

jacob.maine18:06:18

So, here’s a more detailed analysis of how the tests fail

ericdallo18:06:38

If using linux, you could even download the native binary from https://github.com/clojure-lsp/clojure-lsp/actions/runs/2553860615

jacob.maine18:06:29

First, the failure is much more likely to happen in the GraalVM integration tests. Those tests or 5x faster than the JVM integration tests, and so perhaps this is timing-related test flakiness

jacob.maine18:06:39

Let’s look at one of the tests in more detail

jacob.maine18:06:05

The most recent test run is pretty emblematic. https://github.com/clojure-lsp/clojure-lsp/runs/7036727390?check_suite_focus=true

jacob.maine18:06:53

The first test spins up a clojure-lsp server process, and starts sending messages to it as a mock “client”. The client receives all the server’s notifications about the progress the server is making doing the initial analysis, and eventually hears that the analysis is https://github.com/clojure-lsp/clojure-lsp/runs/7036727390?check_suite_focus=true#step:8:368

jacob.maine18:06:06

But then the client has a problem. The client sent the server an “initialize” request, but it fails to get a response from the server. The client waits 30 seconds to get a response, but then https://github.com/clojure-lsp/clojure-lsp/runs/7036727390?check_suite_focus=true#step:8:370 the test

jacob.maine18:06:23

We output a bunch of logs from the server when this happens

jacob.maine18:06:34

First we see clojure-lsp’s logs. It thinks that it responded to the client’s “intialize” request https://github.com/clojure-lsp/clojure-lsp/runs/7036727390?check_suite_focus=true#step:8:397 sending all the notifications about the analysis scan

jacob.maine18:06:20

And after that we see lsp4j’s logs (lsp4j is the lower-level Java library we use to implement the JSON RPC level of the LSP protocol)

jacob.maine18:06:07

Those logs also say that the response to the “initialize” request https://github.com/clojure-lsp/clojure-lsp/runs/7036727390?check_suite_focus=true#step:8:776

jacob.maine18:06:33

If you look at the timings, lsp4j agrees that the response was sent moments after the analysis progress notifications

jacob.maine18:06:16

scrolling back up to the client’s logs, we can observe some additional https://github.com/clojure-lsp/clojure-lsp/pull/1079/files#diff-576f32366c837b68400162f60075ce1677c7e890347f5f53076546d39a0ab9abR77. You can see https://github.com/clojure-lsp/clojure-lsp/runs/7036727390?check_suite_focus=true#step:8:369 that after receiving the analysis progress notifications, the client started waiting on (read-line) to get the next line of output from the server

jacob.maine18:06:02

That read-line is supposed to see (and then discard) a header from the server’s output stream, before it moves on to let the client listener read a chunk of JSON from the stream

jacob.maine18:06:03

Digging a bit deeper into the code…

jacob.maine18:06:13

lsp4j’s trace line is generated https://github.com/eclipse/lsp4j/blob/ab3c4891699fd3b3bd057a13df31982fd5bb96b3/org.eclipse.lsp4j.jsonrpc/src/main/java/org/eclipse/lsp4j/jsonrpc/TracingMessageConsumer.java#L101 and printed to the trace file https://github.com/eclipse/lsp4j/blob/ab3c4891699fd3b3bd057a13df31982fd5bb96b3/org.eclipse.lsp4j.jsonrpc/src/main/java/org/eclipse/lsp4j/jsonrpc/TracingMessageConsumer.java#L109-L112

jacob.maine18:06:47

then it calls the thing that actually puts the message on the output stream https://github.com/eclipse/lsp4j/blob/ab3c4891699fd3b3bd057a13df31982fd5bb96b3/org.eclipse.lsp4j.jsonrpc/src/main/java/org/eclipse/lsp4j/jsonrpc/TracingMessageConsumer.java#L114

jacob.maine18:06:04

the server output stream writer does its thing https://github.com/eclipse/lsp4j/blob/ab3c4891699fd3b3bd057a13df31982fd5bb96b3/org.eclipse.lsp4j.jsonrpc/src/main/java/org/eclipse/lsp4j/jsonrpc/json/StreamMessageConsumer.java#L66-L70

jacob.maine18:06:49

But the client fails to read anything from that output stream https://github.com/clojure-lsp/clojure-lsp/pull/1079/files#diff-576f32366c837b68400162f60075ce1677c7e890347f5f53076546d39a0ab9abR78

jacob.maine18:06:27

So, somewhere between the trace being ~~generated~~ printed and the client receiving a line, something is going wrong

jacob.maine18:06:23

That leaves a few possibilities. It could be in lsp4j. Perhaps the trace is generated and saved to the trace file, but the message is never actually put on the output stream. This seems unlikely.

ericdallo18:06:31

So, it's possible that the bb integration-test client is not parsing properly when (read-line)?

ericdallo18:06:51

> This seems unlikely. Agree

jacob.maine18:06:51

(I’ll get there)

jacob.maine18:06:00

Or, somewhere within Java, the output could be buffered and not written to the output stream

jacob.maine18:06:56

Or, somewhere in the inter-process communication the output could be buffered. (I tried to fix that possibility https://github.com/clojure-lsp/clojure-lsp/pull/1079/commits/f01f773b69289210830ecf20d88266505f07be6d, but my attempt didn’t work.)

jacob.maine18:06:37

Or something in the client side babashka/process, or in babashka’s (read-line) could be failing. If that’s the case, then this is probably a different issue than what @U0178V2SLAY and @U0123R34Q9Y are reporting, because it’s specific to the integration tests, not to the system in general

👍 1

ericdallo18:06:01

My guess is that their issues are not related to the integration-tests..

jacob.maine18:06:01

Maybe @UKFSJSM38… but, it’s pretty similar. Suddenly the client stops receiving messages, even though the server thinks it’s sending them.

jacob.maine18:06:42

Going back to that https://github.com/clojure-lsp/clojure-lsp/pull/1079/commits/f01f773b69289210830ecf20d88266505f07be6d I referenced… we have two processes that are communicating with each other through an OS pipe. That’s true in real life, where the editors start a clojure-lsp process, and then communicate with it through it’s input and output streams. And it’s true in integration tests, which do the same with a mock client

ericdallo18:06:57

yeah, but from @U0178V2SLAY issue, it seems like server doesn't stop that fast, it's like if the go-block thread is full after some other requests, not immediately, so that's why I think are different issues

jacob.maine18:06:42

Yeah, let’s think through @U0178V2SLAY’s issue a little

jacob.maine18:06:00

I know that @U0178V2SLAY’s report was that things stop working only in very specific situations and that @U0123R34Q9Y’s report is that things stop working after several minutes. But I wonder if the integration tests hit this because they run so much faster. Looking at @U0178V2SLAY’s report more, it said that https://github.com/clojure-lsp/clojure-lsp/issues/1058#issuecomment-1160557291. First, the (producer/publish-workspace-edit lsp-producer edit) deref is blocked. And second, the core.async put onto the edits-channel is blocked. Is it true that the first would cause the second? If so, then the problem is really that the client doesn’t receive a response from the server, and so the deref never resolves. This is despite the server being confident that it did send a response.

jacob.maine18:06:26

That’s why the integration tests and bug reports seem similar to me

jacob.maine19:06:42

Thinking about this more… there would really have to be 2 edits. The first edit would have to go through clojure-lsp but not get read by the client. That would cause https://github.com/clojure-lsp/clojure-lsp/blob/master/cli/src/clojure_lsp/server.clj#L89-L90 to block. That would block https://github.com/clojure-lsp/clojure-lsp/blob/13e78e3af8082780ebab21a21166ad9598c27562/cli/src/clojure_lsp/server.clj#L220-L222 from making another loop. Since nothing is reading from the edits-chan anymore, when the second edit came in, it would https://github.com/clojure-lsp/clojure-lsp/blob/master/lib/src/clojure_lsp/feature/file_management.clj#L49. The PR I was working converted that last >!! to put!, but I don’t think that’s going to fix the fundamental problem. The did-open will finish instead of blocking, but the edit still won’t be put on the edits-chan, because that chan is still full. (Instead of waiting for the channel to accept the message immediately, the put! puts that waiting into the background and returns immediately. But it doesn’t actually change that something has to wait for the channel to free up.)

jacob.maine19:06:09

Actually… why do we even have the edits-chan? The only place it’s used is in this one case where we’re creating a namespace in a new file. That’s such a rare occasion. Do we really need all this async machinery to put the edit in the background? Why not just publish the edit directly from did-open? I don’t think that removing the edits-chan would fix the problem that the client stops receiving server responses. But it would at least let us be sure that the problem isn’t with that part of the core.async code. WDYT @UKFSJSM38 ?

mpenet19:06:49

put! is also quite bad uncontrolled, you will just create another issue if you cause the pending put queue (1024) of channels you put into to fill up if you're not consuming fast enough and have a full buffer.

mpenet19:06:51

put! is rarely used correctly/appropriately

jacob.maine19:06:55

@U050SC7SV that makes sense… and I agree, I think problems at the write end of the channel are really just symptoms of a problem at the read end of the channel

☝️ 1

jacob.maine19:06:04

Yet another question… how does that deref work normally? When lsp4j fulfills that CompletableFuture, or whatever it is, it’s implying it knows the client has received a message. But how could it know that? I thought that this was a regular client/server architecture, where responses are unacknowledged. That is, the client doesn’t have to tell the server that it got a response. I’m gonna go look at that part of the lsp4j code.

ericdallo19:06:38

Yeah, IMO we shoukd fix the read of the chan, I think this started to happen after we introduced the deref of waiting for the edit to be applied? I don't remember why we have the edits chan, I think that always existed 😂 , I can't see why a sync function wouldn't work, maybe just handle timeouts if we do something that rely on waiting for client (like that deref) The https://microsoft.github.io/language-server-protocol/specifications/lsp/3.17/specification/#workspace_applyEdit, is a request, so what lsp4j is doing under the hood AFAIK is waiting for client to respond that request, then completing the future on java code

jacob.maine20:06:36

> I think this started to happen after we introduced the deref Yes, that’s when it started. But removing the deref or adding a timeout to it wouldn’t fix the problem, as far as I can see. Instead, it would change it so we don’t see the problem. But under the hood, we’d still have this problem—the client isn’t receiving the request. Changing or removing the deref would only change that we don’t notice that the client doesn’t receive the request. > The https://microsoft.github.io/language-server-protocol/specifications/lsp/3.17/specification/#workspace_applyEdit, is a request Ah, that’s right… thanks, that explains it. The client does tell the server whether it applied the edit by returning {applied: true}

👍 1

ericdallo20:06:34

So, we can discard any communication issue there since the client responded? or there was a issue handling the client response, not completing the future?

jacob.maine21:06:45

Well, clojure-lsp starts blocking when it sends its text edit response to the client. I’m not sure what happens from there. Does the client receive the response? If it does, does it process it, and reply? If it replies, even with an exception, why doesn’t the future complete? The problem could be at any one of those steps. But I think the problem is at that first step. The client doesn’t receive the server’s response. Here’s why… Looking at @U0178V2SLAY’s logs, https://github.com/clojure-lsp/clojure-lsp/issues/1058#issuecomment-1159512725, we can see that at some point the client stops getting any messages from the server (search for

Received response 'nil - (9)'

for the last time the server responded.) That is, there aren’t any Received response , Received request, or Received notification lines. Similarly in @U0123R34Q9Y’s logs, we don’t see any Received lines. And from @U0123R34Q9Y description, the server keeps getting client requests, and processing them… as he navigates, the clojure-lsp logs that it’s publishing diagnostics and processing handlers, all things that it would do if it were receiving requests. But nothing changes in the client, lint squiggles don’t change etc. All of that, along with the absence of Received lines suggests that the client isn’t even seeing clojure-lsp’s messages

jacob.maine21:06:31

@UKFSJSM38 are you on Mac or Linux?

🐧 1

jacob.maine21:06:32

ok… hmm

ericdallo21:06:14

One thing we could check is if it's reproducible on JVM those issues

jacob.maine21:06:49

What would that tell us?

ericdallo21:06:17

Just to discard Graal issues, like reflection issues we had a lot on the past

jacob.maine21:06:36

Gotcha… when you were trying to reproduce https://github.com/clojure-lsp/clojure-lsp/issues/1080 do you remember if you were using native or JVM?

ericdallo21:06:12

Probably JVM, I can try again later today

jacob.maine21:06:34

OK, cool… comparing JVM vs native on Linux seems worth trying. The fact that the client’s logs stop seeing any kind of message from the server—any request, notification, or response—suggests to me that maybe the server’s output stream is broken. lsp4j tries to detect when that happens, and to adjust its behavior, but the code is really convoluted. What if it just starts silently dropping all the server’s messages? The reason I asked about Linux or Mac is that the lsp4j code that https://github.com/eclipse/lsp4j/blob/ab3c4891699fd3b3bd057a13df31982fd5bb96b3/org.eclipse.lsp4j.jsonrpc/src/main/java/org/eclipse/lsp4j/jsonrpc/JsonRpcException.java#L27 whether a stream has closed is all based on string comparisons. What if the underlying OS stream is closed, but on Linux that generates a different string message than what lsp4j is expecting? How would lsp4j behave if the stream had closed but it failed to detect it?

ericdallo21:06:46

Good findings, that's a good question, and it makes me want to drop lsp4j even more 😅

jacob.maine22:06:40

Huh… I have to be afk for awhile, but this is blowing my mind, so I have to write it down… https://docs.oracle.com/javase/7/docs/api/java/lang/System.html#out is a https://docs.oracle.com/javase/7/docs/api/java/io/PrintStream.html. One of the “features” of a PrintStream is that it never throws IOException. But the lsp4j code seems to https://github.com/eclipse/lsp4j/blob/ab3c4891699fd3b3bd057a13df31982fd5bb96b3/org.eclipse.lsp4j.jsonrpc/src/main/java/org/eclipse/lsp4j/jsonrpc/JsonRpcException.java#L30 stream closures to be communicated via IOExceptions

😮 1

bringe23:06:51

I’m seeing this issue as well with Calva. The server logs don’t show any errors. The only mention I see of “error” is this:

2022-06-24T23:10:50.661Z  WARN [clojure-lsp.kondo:260] - Non-fatal error from clj-kondo: No configs copied.

But that looks like it’s not an issue. It does seem though from client<->server logs that the server just stops responding to requests.

👍 1

ericdallo23:06:40

@U9A1RLFNV does that happens with previous clojure-lsp version ?

bringe23:06:25

For example, I can try to go to a definition, and this is all that is printed in the logs:

[Trace - 4:53:12 PM] Sending request 'textDocument/definition - (596)'.
Params: {
    "textDocument": {
        "uri": "file:///Users/brandon/development/honestdeed/backend/development/src/dev/brandon/core.clj"
    },
    "position": {
        "line": 55,
        "character": 20
    }
}

And VS Code just continues to show an animation above the editor like something is in progress.

bringe23:06:30

I’ll try a previous version.

jacob.maine23:06:33

@U9A1RLFNV those are two different sets of logs… can you tell if the the server thinks its sending responses vs. whether the client actually receives those responses?

bringe23:06:34

Eventually this is printed in the logs:

[Trace - 4:55:04 PM] Sending notification '$/cancelRequest'.
Params: {
    "id": 594
}

I’m guessing the client decides to cancel the request.

bringe23:06:20

In the server logs I see this line printed when I try to go to a definition.

2022-06-24T23:56:03.674Z  INFO [clojure-lsp.handlers:193] - :definition 22ms

bringe23:06:30

Idk if something else is supposed to be printed.

jacob.maine23:06:40

client-out -> -------- -> server-in client-in <X -------- <- server-out

bringe23:06:41

Seems it at least got the request (the server).

jacob.maine23:06:12

the server prints :definition 22ms after it’s generated a response, so that it can include how long that took

bringe23:06:30

> can you tell if the the server thinks its sending responses vs. whether the client actually receives those responses? It does seem that the client sends a request, the server gets it, and the client never receives a response. (edited to “does seem”)

jacob.maine23:06:51

“It does not seem” or “It does seem”?

bringe23:06:00

Sorry, meant “does seem”

jacob.maine23:06:50

OK, thanks for confirming that… that’s what I’ve been thinking is going on, but since I can’t reproduce it I haven’t really been able to watch both sides of the logs

ericdallo00:06:15

We should check if that happens for previous version. Also, I think we bumped lsp4j recently, just not sure if it was on this or previous version, I'm on mobile right now

bringe00:06:56

I’m trying 2022.05.31-17.35.50 now and will report back after some time (it takes several minutes of me working for it to occur).

👍 1

jacob.maine00:06:50

@UKFSJSM38… I’ve been fighting against adding a timeout to the deref. But if you’re worried that lots of people are having this problem, we could add it as a stop-gap. We’d be running the risk that the server would be applying edits that it shouldn’t. In the long run, I’d much rather understand and fix the real problem.

ericdallo00:06:48

I think we can use some time to analyze if it's worth the easy fix or spend more days to work on a good fix. LMK if you intend to keep researching that

jacob.maine00:06:13

I’m going to keep researching, but have other stuff to take care of this evening, so probably won’t get to it until tomorrow. If you do want to changed the deref, I think a timeout is best. I don’t want to remove it entirely… When things are working, it’s important to have. Before adding it we had that bug in Calva where the cursor was moved before the drag edits were applied

👍 1

ericdallo00:06:21

No hurries, we didn't receive that much of complains yet, so I'm not sure this happens all the time too, if things get worst, we can add the timeout

👍 1

jacob.maine00:06:10

lsp4j was bumped to 0.14.0 in lsp4clj 0.4.0. Looks like that was included in clojure-lsp in the 2022.06.22 release.

ericdallo00:06:24

Yeah, one more thing to check so, we don't need anything from lsp4j release so we could rollback the bump and start working on dropping lsp4j in the future, but it'd be nice confirm the issue didn't happen on previous lsp4j versions

ericdallo00:06:54

I can't repro the freezing so things are difficult to me to test

jacob.maine00:06:10

I suspect that it was happening on earlier lsp4j releases, but I agree, it’d be nice to know. I can’t repro either, unless you count CI, which we’re not sure is the same thing. Makes it really hard to test

bringe00:06:43

I’ve been working for some time now and the issue hasn’t happened with 2022.05.31-17.35.50.

👍 1

ericdallo00:06:47

Alright, so besides lsp4clj bump, we re-introduced did-change-watched-files which just call didChange, but I'm less confident that could be the issue

ericdallo00:06:31

@U9A1RLFNV I can create a branch with current release but lsp4clj downgraded, could you test it?

bringe00:06:49

Yeah I can do that. If not tonight, then tomorrow.

👍 1

ericdallo00:06:11

Alright, done, the branch is https://github.com/clojure-lsp/clojure-lsp/tree/test-downgrade-lsp4clj

👍 1

lassemaatta04:06:38

I'll try to test drive that branch during the weekend. Big thanks to all of you for debugging this. Somehow I have a feeling that the issue I reported has something to do with the client. It feels like creating a file causes the client to send some message to clojure-lsp (I need to verify this hunch from the message logs) and clojure-lsp tries to respond to it, but because the corresponding buffer is not open in emacs, the response is not handled. (which then leads to clojure-lsp to freeze due to the deref etc) This only happens with treemacs (and .cljs/`.cljc`), because treemacs doesn't automatically open the newly created file. Using find-file (`C-x C-f`) creates and opens the file/buffer immediately and the problem goes away.

jacob.maine06:06:53

@U0178V2SLAY (sorry if I’ve said this before… this topic has come up a few times recently) the LSP spec https://microsoft.github.io/language-server-protocol/specifications/lsp/3.17/specification/#workspace_didChangeWatchedFiles that clients should watch the file system for changes and inform servers. clojure-lsp participates in that scheme, and so when treemacs creates the file, I bet clojure-lsp is notified and that it sends https://github.com/clojure-lsp/clojure-lsp/blob/13e78e3af8082780ebab21a21166ad9598c27562/lib/src/clojure_lsp/feature/file_management.clj#L220 back to the client. (I have no idea why the cljs and cljc files are special. It should do that for clj files too.) You’re right, it’s possible that the client doesn’t handle the server’s messages correctly because it doesn’t actually have that file open. If that’s the case, it’s true that the deref would block. But what confuses me is that the deref is for just one of the many ways that the server can send messages. Specifically the deref is for when it’s asking the client to edit a file. But the server can also send messages for lots of other things. The main other ones are responses to the client’s requests and notifications that lint has changed. From what we see in the server’s logs, it seems to think it’s sending those messages, even after the deref starts blocking. (I think… you can confirm that for us.) That means that there are other threads that continue to process and respond to the client’s requests and continue to send notifications. I can understand why the client wouldn’t seen any of the server’s requests to edit files, because the server is blocking on the deref. But I don’t understand why it wouldn’t see any of those other messages. That’s what’s making me focus on the communication layer, specifically on the server’s output stream. If there’s a problem there, then no messages of any kind would go through. It’s quite possible that I’m not looking at this the right way. Do you see another way of interpreting all of this? It’s also possible that we’re both right… perhaps when the client receives messages about a file it doesn’t have open, it breaks in a way that breaks the server’s output stream.

lassemaatta06:06:16

oh, good to know! I'm certain you know this stuff better than me, this whole LSP business is just a big black box to me which may lead me to make the wrong assumptions and deductions 🙂 I'll try to find the time to play around with this because I have a reliable way to reproduce the problem. I'm thinking about e.g. checking if I could reproduce this with calva (first I need to learn how to use it though), that might tells us something more. Or perhaps try eglot.

lassemaatta08:06:00

I don't want to spam this thread, so I added some observations and logs to my https://github.com/clojure-lsp/clojure-lsp/issues/1058#issuecomment-1166231407. TLDR: creating an empty file from the terminal does not exhibit the problem, the message log contains just a single didChangeWatchedFiles notification. But when I do the same from treemacs, the log contains a bunch of didOpen + didClose notifications + applyEdit from clojure-lsp to add a ns declaration.

Cal Herries09:06:15

> I’ve been working for some time now and the issue hasn’t happened with 2022.05.31-17.35.50. Same with me. Been using clojure-lsp for 20 minutes without issues 🎊 I can try out the test branch tomorrow

lassemaatta09:06:00

@U07M2C8TT I tried locally testing your remove-blocking-puts branch. However, when I create a jar with make clean cli-jar, the resulting .jar seems to run for a few seconds before closing with no error. If I do the same (e.g. java -jar clojure-lsp-standalone.jar) for the master branch, the jar keeps running until I cancel it.

ericdallo13:06:02

@U0178V2SLAY try just make it should create a binary which is a embedded jar

ericdallo13:06:17

Tô be honest, I'm starting to think the issue is not on client since we have both calva and emacs users eith a very similar(same?) issue. That's why making sure things got fixed on these branch's in calva or emacs would help know we are working in the right path

ericdallo13:06:37

@U07M2C8TT LMK if you need any help when you work on this

lassemaatta13:06:38

Ah, ok, I’ll try that later 👍

👍 1

lassemaatta14:06:23

@UKFSJSM38 I tried plain make, which generates a clojure-lsp binary. However, using that remove-blocking-puts branch produces a binary, which exists after a few seconds without printing any errors (exit code was 1).

ericdallo14:06:30

Hum, so probably that branch has something wrong, could you try the test-downgrade-lsp4clj one? https://github.com/clojure-lsp/clojure-lsp/tree/test-downgrade-lsp4clj

lassemaatta14:06:42

sure, just a moment

lassemaatta14:06:04

the test-downgrade-lsp4clj branch builds fine, starts fine and keeps running ok. However, that problem of mine still happens 😿

ericdallo14:06:01

Alright, thanks for testing, It'd be nice to have calva users testing that branch and confirming if that fixes for then, if so, they are different issues. Meanwhile @U0178V2SLAY did you test the previous release? I think I remember you saying this issue happens for a while already right?

lassemaatta14:06:33

I'm not certain if I've tested the previous release (will try it in a moment, EDIT: also happens with the latest release), but I've gone through a few different nightly builds. I think I've seen this problem of mine for several months now at least, perhaps even as far back as last winter. I only just recently started to really focus on it and try to figure out what exactly causes it and what doesn't. The good news is that it requires a very specific set of circumstances to appear, otherwise other people would have reported (and replicated) it already.

ericdallo14:06:30

Yeah, makes sense, I suspect you have a different issue than calva users reported recently so

lassemaatta14:06:31

I tried to repro this in calva earlier today, but couldn't. When you add a new file in vs code, it immediately opens it and the bug doesn't appear. If I do the same in emacs, it works too. The only way to get the problem to appear, is to use treemacs to create a new file. It seems that something (emacs? treemacs?) causes the file to open and immediately close (or at least that's what the LSP IO log looks like).

lassemaatta14:06:15

yeah, I agree

lassemaatta14:06:38

In any case, I think you should concentrate on the calva issue, I can workaround this and I assume this problem isn’t as widespread. Next thing I’ll investigate is if I could repro this with some other server, like python/java/something EDIT: I tried replacing lsp-mode with eglot -> same problem still happens.

👍 1

ericdallo14:06:17

Thanks, we have ideas of improving the async stuff in clojure-lsp to have a way to control async better and easily.

ericdallo14:06:32

That should probably help fixing your issue if it's on server indeed

jacob.maine23:06:37

@U0178V2SLAY thanks for trying the remove-blocking-puts branch, and sorry it’s broken. It’s in a weird state where it only works for running integration tests. If you really want to try it in a working state, you can use commit d455da7, but to be honest, I don’t think it’s worth it. I’m pretty convinced that the problem was never the blocking puts. They were just a symptom of the blocking deref, which is itself a symptom of the real problem. @UKFSJSM38 thanks for the offer of help. I’m currently contemplating writing some Java to debug what’s going on with System.out. My Java skills are abysmal, so we’ll see how it goes. I don’t have any other ideas at the moment. If I get stuck, I’ll let you know. We might want to open this question to a larger venue at some point

ericdallo23:06:25

Sure, agreed with put not being the issue and the deref issue might just be a indicator of a issue lsp4j/clojure-lsp communication mechanism that was always there. IMO we should focus on understand the root cause and spend time on fixing the root issue. If more users complain or fixing the root issue is a epic (like dropping lsp4j) we can consider just removing the deref for now since without then we should not have a critical issue but a specific one. WDYT?

jacob.maine23:06:14

I agree… if the root issue is an epic and/or if we’re getting lots of complaints, then I agree, we should try to find a quick fix. I don’t want to remove the deref. That would cause a regression of https://github.com/clojure-lsp/clojure-lsp/pull/955. We could try adding a timeout though. But, be aware… if I’m right and the underlying problem is that the output stream is broken, then there’s no real recovery. Neither removing the deref nor giving it a timeout will help. The server has lost the ability to communicate with the client, forever. The server might keep sending messages (removing the deref would let it keep sending edits too) but that wouldn’t have any effect because the client would never receive them. The only fix will be a restart.

ericdallo00:06:52

Agreed, do you think I should add the timeout so we can confirm it "fixes" the issue and we are in the right path? While you research how to properly fix the issue

bringe00:06:57

@UKFSJSM38 The problem persists with the test-downgrade-lsp4clj branch.

jacob.maine00:06:02

I suppose it’s worth giving it a shot. If it works, it’ll disprove my hypothesis that something’s wrong at the communication layer

👍 1

ericdallo00:06:13

@U9A1RLFNV thanks, so I think it proves the issue was "always" there and started to affecting final user after we included the deref. The lsp4j bump didn't change anything

👍 1

jacob.maine19:06:58

Well, I’m stuck. My hypothesis was that the OS stream underlying the server’s output stream breaks, and that if I could get that to raise errors instead of silently swallowing writes, I’d see those errors in the logs. To do this I wrapped System/out. The wrapper proxies writes to System/out, but also raises IOExceptions if .checkError, which is a method on System/out, ever returns true. That didn’t get me any more logs. Then I realized that lsp4j logs through java.util.logging, and that we hadn’t wired up timbre<->slf4j<->jul. I did that, though it took a while to get working in Graal. The tests still fail, without any additional logs. Then I added a timeout to the deref. I didn’t expect this to help… the evidence suggests that when deref blocks, it’s a symptom, not a cause. The first several test runs were green, but then I got a failure. So, we’re back to where we started. The client stops receiving the server’s messages and we have no indication why that happens. My new hypotheses are: 1. We still have something wrong with System/out. Perhaps we shouldn’t be using System/out, but instead be using Sockets. There’s some example code in lsp4j that does this. I might be able to get this to work on the server side, but don’t know how to change the client side so that it talks to a socket instead of to the in/out streams of the server process. 2. The wrapper isn’t working, and/or it’s working but it doesn’t cause any logs to be generated, and/or logs are generated but silently dropped (do we need a log config XML file?) 3. There’s nothing wrong with the I/O streams. Instead, somewhere in lsp4j something is blocking writes to the server’s output stream. Or perhaps lsp4j is accidentally dropping the writes. 4. There’s nothing wrong in lsp4j. Something is blocking the reads on the client side. Seems unlikely since several clients, including the integration tests, exhibit the same behavior. 5. There’s something in clojure-lsp that is blocking writes. I have no idea what this might be. Unfortunately, I don’t have many ideas for how to test these. My best guess is that the problem is #3; lsp4j is accidentally blocking or dropping writes. If this is true, I don’t know how to debug it. I don’t really know how to inspect the lsp4j internals. Maybe attach a debugger? Does anyone in this thread have experience with that? I’ve discovered I can reproduce the integration test failures locally, but only if I use a native-image build (GraalVM). Maybe I can get thread dumps, although I’m not sure how hard that is with a native image. Maybe we could ask the lsp4j team for assistance, but I don’t even know what question to ask them... I don’t think I’m going to get very far with, “Is it possible that very rarely your code stops writing output, but not in a way that you’ve ever seen? Do I have a reliable way to reproduce it? No, not really.” My second best guess is #5; clojure-lsp is blocking writes. I don’t have any ideas for what code might be blocking, no less blocking every write. What do you all think? Any other thoughts on what’s happening? Which do you think is most likely? @UKFSJSM38, you brought up the possibility of replacing lsp4j. On the one hand, that seems like a major undertaking just to debug something that we’re not even sure is related to lsp4j. On the other hand, lsp4j is, to me at least, quite complicated. If we spec out what that effort would look like, we could get a feel for how much of an epic it is. I’ll plan to open an Issue.

ericdallo20:06:24

So the timeout didn't work? so our easy fix would need to just remove the deref is that right? My best guess is #5 as well, if so, replacing lsp4j with a internal solution on lsp4clj would not result in anything right? I can take a closer look tomorrow probably and try a easy fix (or if you manage to do before me), the issue is not being able to repro that.. I'll probably need to setup and test with calva, to be honest, i'm not confident with our integration tests, their failures coulbe this or just another crazy test issue.. > @UKFSJSM38, you brought up the possibility of replacing lsp4j. On the one hand, that seems like a major undertaking just to debug something that we’re not even sure is related to lsp4j. On the other hand, lsp4j is, to me at least, quite complicated. If we spec out what that effort would look like, we could get a feel for how much of an epic it is. I’ll plan to open an Issue. Yes, I already coded on lsp4j and it's pretty hard to understand what happens there, IMO we should replace it in long-term improving lsp4clj.

jacob.maine20:06:08

> So the timeout didn’t work? No, it didn’t work. > so our easy fix would need to just remove the deref is that right? If you remove the deref, you will break drag, create-function, and create-test for all users (anything that uses :show-document-after-edit), although Calva users will see it most. You also run the risk of breaking other things, since edits could be applied out of order. In addition, from what I know of what’s going on, you’re unlikely to fix any of the problems we’re seeing. (Adding the timeout should have made @U0178V2SLAY’s editor stop freezing, but it still leaves him in a bad state—clojure-lsp won’t be working.) That is, I know you think that the problem started with the introduction of the deref, or that it got worse when the deref was introduced, but I don’t agree with either of those. We might have done other things to make this failure more likely. It’s even possible that it’s happening more often because of performance improvements. If, for example, it happens only when certain requests get interleaved, then faster responses would make it more likely. But I don’t think the deref is the cause of the failure. I don’t want to stop you from seeing what happens when you remove it. As you said, it’s hard to repro, so it’ll be hard to tell if it’s working. But if you try it and get really good evidence that the failure goes away when you remove the deref, that’s great. But I think the real problem is elsewhere, and that at the moment we don’t have any “quick” fix.

jacob.maine20:06:42

> My best guess is #5 as well, if so, replacing lsp4j with a internal solution on lsp4clj would not result in anything right? Correct, if the problem is #5, then replacing lsp4j wouldn’t help.

ericdallo20:06:23

yeah, that's bad..

ericdallo20:06:52

so we probably don't have quick fixes and we are not sure replacing lsp4j will fix it as well

jacob.maine20:06:11

> i’m not confident with our integration tests, their failures could be this or just another crazy test issue That’s true and good to keep in mind. I’ve spent so long on this I don’t have the patience to do the “mess around in Calva for 10 minutes” approach, though I know I should. 😅

jacob.maine20:06:23

> so we probably don’t have quick fixes and we are not sure replacing lsp4j will fix it as well Exactly

ericdallo20:06:45

but we do know that previous release was working, so maybe the tradeoff of losing what we fixed from previous release and wining the non freeze state anymore may worth it? I'll still test with calva later to see if there is a hidden quick fix

jacob.maine20:06:00

> we do know previous release that was working I don’t think we know that. We have reports that it happens less often, but from a very small sample. > losing what we fixed from previous release… may be worth it Maybe, but I’m not sure… we have three people (@U0178V2SLAY, @U0123R34Q9Y and @U9A1RLFNV) who are reporting that this is happening. But at least a few of them have said that it’s been happening for months, maybe as long as 6 months.

ericdallo20:06:30

I thought only @U0178V2SLAY was having the issue (emacs) for some time, the other (Calva users) started to have on recent release (also there is a coworker that reported the same issue on Calva, he upvoted the calva issue)

Cal Herries20:06:20

Yes at least for my issue, it only started happening last week. And now it’s been fine since I’ve been working on 2022.05.31-17.35.50

👍 4

Davino Junior20:06:25

Hey guys, I started experiencing this issue last Thursday on vscode. Only fix that worked for me was using a previous version of clojure-lsp

👍 2

Davino Junior20:06:28

btw I'm using 2022.05.31-17.35.50 version as well

ericdallo20:06:11

So I think this helps narrowing down the issue right @U07M2C8TT?

jacob.maine20:06:59

OK, you’ve convinced me. 🙂 Something changed in the last release that broke things for Calva users

ericdallo20:06:13

Yeah, and I have something in mind: Calva/vscode send did-change-watched-files a little bit different than emacs (I don't know about vim) and since we changed some things there, it may be why we only have calva users with that issue

jacob.maine20:06:21

@UKFSJSM38 it helps narrow a little. You can try removing the deref, but it’s been there since the 2022.05.03 release, so that’s more evidence it isn’t the source of the problems. Otherwise, I guess we have to bisect through the changes to see where it started. Anyone seen any pattern to when it starts happening?

jacob.maine20:06:54

> did-change-watched-files That sounds like a good target

ericdallo20:06:57

IIRC, vscode send one single batch of did-change-watched-files while lsp-mode sends lots of them (or the vice versa)

ericdallo20:06:46

I'm starting to think this is the same problem we had last time we tried to add did-change-watched-files one year ago 😂

jacob.maine20:06:04

oh no!

ericdallo20:06:12

Oh, I have a suggestion for now!

ericdallo20:06:37

try disable :compute-external-file-changes setting https://github.com/clojure-lsp/clojure-lsp/commit/e98249d14ee914f199ca698ee14b4319816658b2#diff-51d8fac8f2db4fe4a584b81b05fd2aacff71c55df99264f587c459aef335b305R9

ericdallo20:06:04

@U032DBM7NDP or @U0123R34Q9Y could you try setting this setting to false and check if the problem keeps happening using the latest release?

{:compute-external-file-changes false}

jacob.maine20:06:29

With the latest release, right @UKFSJSM38?

ericdallo20:06:35

Yes!

Davino Junior20:06:25

Sure

Davino Junior20:06:14

Where is this setting supposed to be included, though? I dont see it here

➕ 1

ericdallo20:06:19

you can add it in your ~/.config/clojure-lsp/config.edn or <project>/.lsp/config.edn

Davino Junior21:06:47

Ok got it, I will update here soon if I can reproduce the problem (or not)

Cal Herries21:06:55

Same here

ericdallo21:06:26

Thanks!

ericdallo21:06:28

The issue from last year was similar but not the same @U07M2C8TT https://github.com/clojure-lsp/clojure-lsp/commit/ba54fea9e441037fb153815cb5b4752ef85e1b14, it was a little bit worst, changes overwriting other changes in the file. Even so, test with that new setting disabled it's worth as it was a critical change indeed on how the server works

jacob.maine21:06:11

It does seem like a slightly different problem. But I wonder if that old problem still happens… did-change still puts new text in the db, even if the version is older.

ericdallo21:06:44

Yeah, I think the issue was with that inc that the new version doesn't have, but let's wait to confirm if that old problem will come back or not 😂

jacob.maine21:06:18

Ah, the inc changes makes sense. But, yeah, let’s focus on one problem at a time. :)

Cal Herries01:06:52

I’m getting the same issue again as before, after adding {:compute-external-file-changes false} to ~/.config/clojure-lsp/config.edn

ericdallo12:06:39

So the issue may be another thing introduced on previous version..

Davino Junior12:06:36

I have not experienced the issue so far, but maybe its too soon (did not had a chance to test so much yesterday)

Davino Junior12:06:10

For reference, these are the versions I'm using atm: clojure-lsp: 2022.06.22-14.09.50 clj-kondo: 2022.06.22

Davino Junior12:06:05

Ok nvm, just got the issue again with latest release and {:compute-external-file-changes false}

ericdallo12:06:59

alright, thanks both for testing, I'll try to find what could introduced the issue from previous to current release soon

jacob.maine19:06:18

Quick update… @UKFSJSM38 is working on a branch that rolls back some of the major changes between 2022.05.31 and 2022.06.22. When that’s ready, we’ll ask you all to test it. One other question, @U032DBM7NDP, @U0178V2SLAY and @U9A1RLFNV… are any of you using Apple silicon?

👍 1

jacob.maine19:06:41

@U0123R34Q9Y same question for you too

bringe19:06:45

I’m using Apple Silicon.

ericdallo19:06:31

This should revert some changes we believe may have affected and caused the issue https://github.com/clojure-lsp/clojure-lsp/pull/1082

👍 1

Davino Junior19:06:36

Intel-based here

ericdallo19:06:33

To test, you need to: • clone clojure-lsp project • checkout the branch • make to generate the JVM binary • change calva to use that binary , there is a setting for that on calva vscode settings

Davino Junior19:06:10

I'm trying to get some detailed log with @UKFSJSM38 about this issue and one thing that I noticed when reproducing the problem is that clojure-lsp stops as soon as it finds some invalid code... (and from the logs we can see something like WARN [clojure-lsp.parser:115] - It was not possible to parse file. Probably not valid clojure code. ). E.g. of reproducing the problem: 1. Adding a still-to-be-defined function ; 2. clojure-lsp tries to parse it and spits what I just pasted above on the logs; 3. At this moment, this seems to be a non-recovery point. Even deleting previous reference to the non-defined function, restoring file with git and trying to save it again... nothing makes clojure-lsp recover Does any of it make sense?

ericdallo19:06:08

Great, finally I can repro this ☝️:skin-tone-3: only on Calva but we need to check if it's the same issue c/c @U07M2C8TT just inserting a random wrong paren ) repro the issue

jacob.maine19:06:16

I’ll see if I can repro too

jacob.maine19:06:49

I can reproduce the “not possible to parse” warning in Calva but after that I still get completions, can navigate to definitions, etc.

ericdallo19:06:26

I can't repro anymore, so it does seem flaky

jacob.maine19:06:47

(I know why we get the parse error in Calva but not Emacs. After every edit Calva requests code-actions so that it can show the lightbulb. And code-actions parse the file, hence the parse error. Emacs, by default anyway, doesn’t immediately request code-actions and so it doesn’t immediately end up with a parse error.)

ericdallo19:06:51

the parse error is expected on both editors right? just not the complete stop of next requests even after removing the wrong )

jacob.maine19:06:07

after a few more tries, i’m able to reproduce in Calva too

jacob.maine19:06:27

Yes, you can get Emacs to show the parse error too, if you add an extra ), then open the code actions menu. And, right, that’s expected, but unexpected that this would make the client stop receiving messages

👍 1

jacob.maine19:06:48

Let me try for awhile to see if I can get Emacs to exhibit the same problem

jacob.maine19:06:49

by the way, thanks @U032DBM7NDP! this is a HUGE step forward

🙏 1

jacob.maine20:06:24

When comparing thread stack dumps between Emacs and Calva, I see a lot more clojure-agent-send-off-pool-%d threads parked on the Calva side. I believe that futures (clojure’s version, not CompletableFuture) use the agent pools. But other things might as well. @UKFSJSM38 perhaps you and @U032DBM7NDP can focus bisection there

ericdallo21:06:03

Did you repro on emacs? it's weird that only happens on calva

ericdallo21:06:57

> parked on Calva side Oh you mean like Calva is not answering server right? making clojure-lsp hang some threads

jacob.maine21:06:39

No, I haven’t been able to repro on Emacs

jacob.maine21:06:35

> Calva is not answering server Not quite. I have two clojure-lsp processes, one started by Emacs (which is behaving fine, even after introducing syntax errors) and one started by Calva (which is broken after introducing syntax errors). I run jstack on both of them and compare the output with diff. The one from Calva has more agent send-off pool threads in it, like this:

jacob.maine21:06:47

> "clojure-agent-send-off-pool-5" #34 daemon prio=5 os_prio=31 cpu=24.45ms elapsed=1075.37s allocated=2224K defined_classes=0 tid=0x00007fdc2f245800 nid=0xaa03 waiting on condition  [0x0000700012978000]
>    java.lang.Thread.State: TIMED_WAITING (parking)
> 	at jdk.internal.misc.Unsafe.park([email protected]/Native Method)
> 	- parking to wait for  <0x000000078155d5c8> (a java.util.concurrent.SynchronousQueue$TransferStack)
> 	at java.util.concurrent.locks.LockSupport.parkNanos([email protected]/LockSupport.java:234)
> 	at java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill([email protected]/SynchronousQueue.java:462)
> 	at java.util.concurrent.SynchronousQueue$TransferStack.transfer([email protected]/SynchronousQueue.java:361)
> 	at java.util.concurrent.SynchronousQueue.poll([email protected]/SynchronousQueue.java:937)
> 	at java.util.concurrent.ThreadPoolExecutor.getTask([email protected]/ThreadPoolExecutor.java:1053)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker([email protected]/ThreadPoolExecutor.java:1114)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run([email protected]/ThreadPoolExecutor.java:628)
> 	at java.lang.Thread.run([email protected]/Thread.java:829)
>
>    Locked ownable synchronizers:> 	- None

jacob.maine21:06:35

It also has a “process reaper” thread, which I think gets created when you run an OS processes, like via sh:

> "process reaper" #39 daemon prio=10 os_prio=31 cpu=39.31ms elapsed=1070.91s allocated=13144B defined_classes=0 tid=0x00007fdc30325000 nid=0x15303 waiting on condition  [0x00007000108bc000]
>    java.lang.Thread.State: TIMED_WAITING (parking)
> 	at jdk.internal.misc.Unsafe.park([email protected]/Native Method)
> 	- parking to wait for  <0x000000078bac0478> (a java.util.concurrent.SynchronousQueue$TransferStack)
> 	at java.util.concurrent.locks.LockSupport.parkNanos([email protected]/LockSupport.java:234)
> 	at java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill([email protected]/SynchronousQueue.java:462)
> 	at java.util.concurrent.SynchronousQueue$TransferStack.transfer([email protected]/SynchronousQueue.java:361)
> 	at java.util.concurrent.SynchronousQueue.poll([email protected]/SynchronousQueue.java:937)
> 	at java.util.concurrent.ThreadPoolExecutor.getTask([email protected]/ThreadPoolExecutor.java:1053)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker([email protected]/ThreadPoolExecutor.java:1114)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run([email protected]/ThreadPoolExecutor.java:628)
> 	at java.lang.Thread.run([email protected]/Thread.java:829)
>
>    Locked ownable synchronizers:
> 	- None
>

jacob.maine21:06:59

Err…. wait… maybe it’s the process that Emacs started that has the extra agents threads and process reaper. Better figure that out first 😅

😅 1

jacob.maine21:06:03

Yeah… ignore this. I was just able to get two essentially identical stack dumps, one from a “good” clojure-lsp and one from a “frozen” clojure-lsp, both started by VS Code.

ericdallo21:06:40

so, what we have is: • we know Calva's LSP stop working when we add a invalid code and remove it • checking server side it seems all is working perfectly • it doesn't happen on other editors (at least Emacs) So, it seems the issue could be on Calva's LSP client? (vscode-language-client)

ericdallo21:06:53

But, the problem doesn't happen on pervious clojure-lsp version which doesn't match with that idea

ericdallo21:06:31

@U07M2C8TT I can't repro the issue anymore, did you do anything different on calva?

jacob.maine22:06:42

don’t think so

jacob.maine22:06:29

I have to be offline for the rest of the day, but what I plan to do tomorrow is a) confirm that the 2022.05.31 build works and b) git bisect between that build and 2022.06.22, running make, restarting clojure-lsp in Calva, and trying to trigger the problem. Should be much less tedious now that we have a quick way to repro. I just hope that repro keeps working:crossed_fingers:

ericdallo22:06:52

Sure, Thanks for the help!

🙏 1

bringe23:06:22

Just to add another data point, I’ve been running off that rollback PR mentioned above (#1082) and I haven’t had the issue.

ericdallo23:06:57

Thata good news @U9A1RLFNV but I can't relate with calva only issues yet, and can't repro the issue all the time (only one)

bringe23:06:50

> I can’t relate with calva only issues yet What do you mean?

ericdallo23:06:48

I mean, I can't see how those reverted commits could be related with a issue that only happens in Calva, I think we need to bisect those to check which commit caused the regression on calva

👍 1

Tuomas-Matti Soikkeli07:06:46

Experiencing freezing as well in a scale that I had to revert to cursive. Is there a guidance how I could downgrade with calva?

lassemaatta07:06:15

I'm not a calva user, but perhaps https://calva.io/clojure-lsp/#changing-the-version-of-clojure-lsp-used-by-calva might help?

🙏 1

Tuomas-Matti Soikkeli07:06:27

Thanks, it was simple as opening settings in vscode. I will test 2022.05.31-17.35.50

ericdallo14:06:33

I can't repro on calva, I did repro one time only but can't anymore :( So, can anyone that is having the issue help finding which one of https://github.com/clojure-lsp/clojure-lsp/pull/1082/commitsfixes the issue? You can clone and revert the commits and test with calva @U0123R34Q9Y @U9A1RLFNV @UM1PCCLNN @U032DBM7NDP

👍 1

Tuomas-Matti Soikkeli15:06:39

I can do it tomorrow. As sidenote, my environment often crashed when I had a debugger on.

jacob.maine17:06:37

I’ve identified a bad commit with git bisect and created this https://github.com/clojure-lsp/clojure-lsp/pull/1083 which reverts it. Will a few of you check out that PR’s branch, and confirm that it fixes the problem?

jacob.maine17:06:29

@UKFSJSM38 I’ll start researching why that commit caused the problem. If you want a quick fix, and if others agree that this fixes the problem for them, that commit can be merged. If you want to wait a little while I research, perhaps we can come up with a fix that doesn’t lose the new 1045 feature which improved hover element selection

ericdallo17:06:19

I'm quite surprised that commit could cause the freezing, but if calva users confirm, we can merge it.

ericdallo17:06:45

No problem merge the revert, I found a issue on that PR today, so a new PR with no freezing and a bug fix would be better :)

ericdallo17:06:05

@U9A1RLFNV @U0123R34Q9Y @U032DBM7NDP can one confirm the issue doesn't happens with this PR ☝️ ? if we confirm that, we can merge the fix and do a quick fix release

jacob.maine17:06:12

Agreed, it surprised me that that was the commit git bisect identified. We’ll see what more careful inspection turns up. I’m just really glad to have a single thing to focus on (and glad it wasn’t a dep version bump commit!)

bringe17:06:02

I can try that PR, but I’ve also been running with 190a948df3608dd119d8d5c4c90d4d93edb0f55b and haven’t had the issue yet.

bringe17:06:42

I did have the issue with 988e7d46c3f67112231b8c0c091e0a743040d4d7, however. (Referring to https://github.com/clojure-lsp/clojure-lsp/pull/1082/commits)

ericdallo17:06:34

hum, interesting, any of those commits from #1082 are related with @U07M2C8TT’s revert

ericdallo17:06:57

I think we need to test a little bit more to make sure we are reverting the correct commit

jacob.maine17:06:18

there’s a chance that there’s weird commit interaction, but i’d really prefer to focus on my PR. if it fixes things for you @U9A1RLFNV and others, then we can postpone the commit interaction research. if it doesn’t then we’ll have to go down that (much harder) road.

👍 2

ericdallo18:06:35

Alright, finally with @U07M2C8TT’s help, we found how to always repro the issue: 1. open a clojure file 2. insert a invalid code like just any random ) 3. hover at any element like when you want to see a function defintion docstring 4. removing the invalid code won't make lsp work again

ericdallo18:06:03

Also, I confirmed that @U07M2C8TT’s PR fixes that issue, so I'm confident enought to merge it and release, WDYT?

Davino Junior18:06:00

Hi guys, sorry did not have a chance to see the message earlier, do you still need testing on the fix-PR?

ericdallo18:06:37

It'd be great if you could test that PR @U032DBM7NDP :) (this https://github.com/clojure-lsp/clojure-lsp/pull/1083)

👀 1

Davino Junior18:06:43

ok Im using clojure-lsp built from the PR and see no issues so far, will update shortly

👍 1

🤞 1

Davino Junior19:06:11

No issues, I think this PR fixes it

ericdallo19:06:39

Cool! WDYT @U07M2C8TT let's merge and do a release?

jacob.maine19:06:06

:thumbsup: I don’t have any leads yet on what’s going on with that commit, so I think it’s best to release the revert and figure out the problem later

👍 1

ericdallo19:06:18

Release in process

bringe20:06:00

I haven’t had any issues either with that PR 🎉

ericdallo21:06:28

Great! We finally understand now what is the root cause, and we will work on avoid crashes like that in the future. The fix was released under 2022.06.29-19.32.13 version, thank you very much everyone that helped with the debugging, @U07M2C8TT for keep debugging and finding the root cause 👏 @U9A1RLFNV @U0123R34Q9Y @U032DBM7NDP @UM1PCCLNN

👏 3

jacob.maine21:06:14

For those of us who were really loving the new hover and were sad to see it reverted, I’ll have a PR that restores it soon. As @UKFSJSM38 said, we’ve figured out why that commit broke clojure-lsp for Calva users, how to fix it, and why it wasn’t a problem in Emacs. See the upcoming PR for details. @UKFSJSM38 has some great ideas about how to avoid similar problems in the future. Thanks again to everyone who helped track this down and confirm which fixes were working!

👍 3

bringe21:06:42

Awesome! @UKFSJSM38 Did you mean 2022.06.29-19.32.13?

ericdallo21:06:10

oh, yes 😅

bringe21:06:36

Great job debugging to everyone who worked on it!

🚀 1

jacob.maine21:06:51

@U0178V2SLAY , sorry to say, we still don’t understand the problem you’ve been seeing. But, it has similar symptoms so there’s a (slim) chance that this will help. If not, perhaps @UKFSJSM38’s long-term fixes will. Keep pestering us if the next few releases don’t sort things out for you.

☝️ 1

👍 1

jacob.maine22:06:21

For anyone who’s curious https://github.com/clojure-lsp/clojure-lsp/pull/1087 that restores the new hover feature and discusses how that code accidentally made clojure-lsp freeze for Calva users

👀 1

Davino Junior22:06:01

Thats awesome guys, great job!!

Davino Junior22:06:03

I for one really liked the hover feature, glad its coming back

🤞 1

👍 1

2022-06-24

Channels