Fork me on GitHub
Noah Bogart18:03:35

I'm updating coc-clojure. Are there any docs for what resolve-macro-as and create-test do? The refactoring section of capabilities has no descriptions of any of the commands.


About what they do should be, there is a missing gif for create-test though

👍 1

I'd suggest mentioning on the docs to avoid calling the commands manually since most of them are available on code actions, and clojure-lsp knows what code actions are available on the line/col

Noah Bogart18:03:43

i can do that


I’ve been researching performance and wanted to report some findings…


In my opinion, clojure-lsp is fast enough for daily use. That said, I’ve been curious about performance. Where does it spend time? Are there any easy wins? And more generally, what’s a good way to profile the code in this project? I’ve started with completion, because I recently fixed a performance problem there. To diagnose that problem I used metrics-clojure to time various parts of the code. By bisecting down I found one bottleneck. That approach worked, but was pretty tedious-I spent lots of time moving around timers. Since then, I’ve been playing with clj-async-profiler, with the assumption that a sampling profiler would point out the hotspots even as the code was refactored, without having to change the profiling code at each step. In my first attempt I copied some code from the completion tests and profiled that. This was the wrong approach! The test code clears out the analysis database before every run, so it inherently starts with a tiny dataset. The profiler highlighted a few things that are lost in the noise of completion in a real project. In my second attempt, I got the profiler attached to a dev build of clojure-lsp, which has a repl built into it. With that, I could run completions on real files in a real project’s db. Later I’ll leave notes about what I needed to change to make that happen. This round of profiling had more interesting results. - criterium/quick-bench reported that completing (from a random character-like position within one particular file) averages 82ms (99% 67-98ms). Not bad really. - Running clj-async-profiler on 128 random locations (which took 12.2 seconds) produced the first attached flamegraph. (A quick note on the flamegraphs. They’re interactive SVGs. If you download them you should be able to click to drill into individual bars.) This first flamegraph (#1) highlights a few things. During completion, about 32% of the time is spent in the parser, first in rewrite-clj, parsing the file’s code into a zipper, and then navigating to various parts of the zipper. The remaining time is spread out over several helpers: completion/with-clojure-core-items, queries/find-all-ns-definition-names, completion/with-elements-from-alias, and queries/find-element-under-cursor. With the exception of queries/find-element-under-cursor, one thing that all of these helpers have in common is that they iterate over all the analyzed elements in the db via some form of (mapcat val (:analysis @db)). In my test runs, that’s a sequence of about 115,000 elements. It makes sense that iterating over that sequence multiple times would be expensive. Another common feature of these helpers is that they all filter on bucket. On a hunch, I decided to build an index on bucket, (group-by :bucket (mapcat val (:analysis @db))) at the start of the main completion function, then pass that around. Then the helpers can pick out the bucket that they’re interested in. This made the whole thing slower! Mean 91ms (99% 82-111ms), or about 11% slower. But, the flame graph for this attempt (#2, 128 locations, 14 seconds) is instructive. Now the bulk of the time outside of parsing is spent building the index with group-by. The individual helpers fade into the background, for the most part. But what if the index was prebuilt? If I go a step further, and build the index outside of completion, the mean comes down to 57ms (99% 45-67ms), or about 30% faster than the original. And that’s only for a single-level index. There could be more improvements if we maintained a multi-level index. See flamegraph #3 (133 locations, 9.3 seconds). The other big target for performance improvements would be to keep a cache of the zipper. Both of those approaches—keeping an index of analysis by bucket and caching the zipper—have benefits and drawbacks. It would mean more work maintaining caches, which has to be done with care. There can be correctness problems coming from cache invalidation, and even performance problems spending too much time maintaining the cache. And, as usual with performance improvements, we would be trading time for space. I’m not sure how much concern memory usage is, but we should probably keep it in mind. One possibility for keeping a multi-level index of the analysis would be to move it into a Datascript database, with indices on bucket, name, namespace, filename, etc. I know this project used some datalog style db (datalevin?) for awhile, but if I recall, that was more for persistence of a big hashmap than actual indexing and datalog queries. IMO, a datalog-style query language for accessing the analysis would also be pretty natural, and reasonably intuitive to optimize. In the past I’ve proposed that the queries namespace expose a transducers-based API, but from a performance perspective, that’s not the best choice. Transducers wouldn’t fix the problem of iterating over an unindexed sequence, they just make it easier to do as performantly as possible. Anyway, my plan from here is: - Profile formatting, which can also be slow. Most of the formatting time is probably spent in cljfmt, but still, it’s worth checking. - Write up how to get the profiler running. If anyone has other candidates for profiling, send them along. I can’t promise I’ll try every idea, but it’d at least be nice to have a list of areas to look at.


this looks very useful but is really crying for a better presentation method. An issue with the graphs inline or a blog post


Thanks for the summary @U07M2C8TT, I already spent some time improving performance of clojure-lsp and IMHO we reached a level of no easy quick-wins. Just FYI the kondo analysis are already returned from kondo grouped by bucket, we ungroup after calling for easier/faster queries some times. AFAICS, the feature that has most performance issues that users notice is the formatting one, I'd suggest checking if there is anything we can improve on cljfmt side since we already improved everything we can on LSP side AFAIK. We already opened 1 or 2 prs there, so maybe there is more to improve? About memory vs performance, we should check the tradeoff, how much memory we would lose for performance? Also, we would need to check how that behaves with GraalVM compiled clojure-lsp which usually is faster and has better memory allocation


@UKFSJSM38 I agree completely. My biggest take away is that completion performance is already really good and there aren’t any quick wins. I’m curious to see what formatting looks like. Those ideas about storing other indices are just ideas. They’d need a lot of thought. It’s interesting that the kondo data is grouped by bucket originally and is regrouped by filename. It’s possible that storing both the by-bucket and by-filename versions would speed up a lot of queries, and wouldn’t change memory usage much because of Clojure’s sharing of immutable data. That’s another avenue to explore.

👍 1

I agree, we could make some POCs to understand how much memory costs for big projects having analysis grouped on db as well


@U11BV7MTK it’s a fair criticism that Slack formatting isn’t ideal, but my intention was to share some preliminary data with a group of people who might be interested in the internals of clojure-lsp. IMO, github isn’t a good alternative. First, not so many people lurk in the Issues section, meaning the conversation is less engaging and more drawn out. And, the last time I uploaded one of these SVGs to github, it “helpfully” converted it to an image, losing the interactivity. A blog post would let me embed the images, but again, this is preliminary data. A quick reading of this data as a blog post might come off as “clojure-lsp is slow”. But I’d argue the opposite, and making that argument more thoroughly would take more data and a lot more writing. Also… I still haven’t gotten around to setting up that blog. Maybe someday 😉

👍 2

Now to totally contradict myself, a post that would definitely be better on Github... Below is a diff for getting clj-async-profiler working. Some commentary: - The binary has to permit the profiler to attach “-Djdk.attach.allowAttachSelf=true” - cli/deps.edn needs a new alias for the perf dependencies - The uberjar has to be built with that alias - Some file somewhere must require the dependencies, or they’ll be optimized away Then make to create ./clojure-lsp. Go through the [usual hoops]( to connect a repl to the lsp process. Then in a bufer somewhere,

(def filename "/some/path/clojure-lsp/lib/src/clojure_lsp/feature/completion.clj")
(def uri (str "file://" filename))
(def text (get-in @db/db [:documents "file:///some/path/clojure-lsp/lib/src/clojure_lsp/feature/completion.clj" :text]))
(def idxs (for [[lno line] (map-indexed vector (string/split text #"\n"))
                [cno char] (map-indexed vector line)
                :when (re-find #"[a-zA-z0-9/]" (str char))]
            [(inc lno) (inc cno)]))

  (let [[row col] (rand-nth idxs)]
    (completion uri row col db/db)))

(let [sample (random-sample 0.01 idxs)]
  (println "completing" (count sample) "locations")
      {:min-width 5}
      (doseq [[row col] sample]
        (completion uri row col db/db)))))
diff --git a/cli/build.clj b/cli/build.clj
index f8a55206..3bd135ea 100644
--- a/cli/build.clj
+++ b/cli/build.clj
@@ -65,7 +65,7 @@
   ((requiring-resolve 'deps-bin.impl.bin/build-bin)
    {:jar uber-file
     :name "clojure-lsp"
-    :jvm-opts ["-Xmx2g" "-server"]
+    :jvm-opts ["-Djdk.attach.allowAttachSelf=true" "-Xmx2g" "-server"]
     :skip-realign true}))
 (def prod-jar uber-aot)
@@ -74,7 +74,7 @@
   (uber-aot (merge opts {:extra-aliases [:native]})))
 (defn debug-cli [opts]
-  (uber-aot (merge opts {:extra-aliases [:debug]}))
+  (uber-aot (merge opts {:extra-aliases [:debug :perf]}))
 (defn prod-cli [opts]
diff --git a/cli/deps.edn b/cli/deps.edn
index ee257cb7..3d11e177 100644
--- a/cli/deps.edn
+++ b/cli/deps.edn
@@ -32,6 +32,11 @@
                  :jvm-opts ["-Xmx2g" "-server"]}
            :debug {:extra-deps {nrepl/nrepl {:mvn/version "0.9.0"}
                                 cider/cider-nrepl {:mvn/version "0.28.3"}}}
+           :perf {:extra-deps {criterium/criterium {:mvn/version "0.4.6"}
+                               com.clojure-goes-fast/clj-memory-meter {:mvn/version "0.1.3"}
+                               com.clojure-goes-fast/clj-async-profiler {:mvn/version "0.4.1"}}
+                  :jvm-opts   ["-Djdk.attach.allowAttachSelf"]}
            :native {:jvm-opts ["-Xmx2g"
diff --git a/lib/src/clojure_lsp/feature/completion.clj b/lib/src/clojure_lsp/feature/completion.clj
index 766a04c8..41a728e7 100644
--- a/lib/src/clojure_lsp/feature/completion.clj
+++ b/lib/src/clojure_lsp/feature/completion.clj
@@ -1,5 +1,6 @@
 (ns clojure-lsp.feature.completion
+   [clj-async-profiler.core :as profiler]
    [clojure-lsp.common-symbols :as common-sym]
    [clojure-lsp.feature.add-missing-libspec :as f.add-missing-libspec]
    [clojure-lsp.feature.completion-snippet :as f.completion-snippet]
@@ -12,6 +13,7 @@
    [clojure-lsp.shared :as shared]
    [clojure.string :as string]
    [clojure.walk :as walk]
+   [criterium.core :as bench]
    [medley.core :as medley]
    [ :as z]))


I'm ok adding performance profiles or even adding to the debug profile we have


@UKFSJSM38 OK, I’ll put together a PR


Final bit of analysis… Here’s a flamegraph for formatting. This is a call from clojure-lsp, but nearly all of the time is spent in cljfmt. I don’t know that project at all so I can’t say whether this graph suggests performance targets. It looks like it spends most of its time indenting, with another chunk spent cleaning up whitespace. Naively, that’s about what I’d expect.


Yes, I was expecting something like that, I think we should focus on range formatting, here is why: • IIRC its performance is worst than full file format on some cases (not huge buffers) • It's called automatically on most clients when user changes code via some parens manipulation.


Gotcha. Do you know what clients send as format-pos? I.e., is it the range of a symbol, a line, a top-level form, or something else?


It's always the line/col of start and end of a region, that could be the region user selected and applied the range formatting or the region affected by some paredit action


Well, let’s see… 1. If you name a local variable ns, then cljfmt/reformat-form will throw an error. I’ll plan to report that as a bug. 2. If you generate ranges from random pairs of locations, quick-bench reports a mean of 191ms, but with a large standard deviation 99% 128-246ms. Sometimes those random ranges include a lot of forms and sometimes they don’t include any. :man-shrugging: 3. The flamegraph shows that a chunk of time is in clojure-lsp.parser/find-top-forms-in-range, and then another chunk in cljfmt/reformat-forms.


None of it seems that slow. Are users reporting slower times?


(For my reference, that last flamegraph was generated by profiling 80 runs of random ranges. I lost the exact time, but I think it ran for 11 or 12 seconds.)


Yes, there are multiple issues reports about performance especially after editing with paredit on emacs, I took a look at the time and it was range formatting fault indeed, there are some issues closed clojure-lsp where you can maybe find some huge project to test it, but I remember that the issue is especifically which large buffers (more than 1k lines) even if we are range formatting a small (+ 1 2) of that buffer


Interesting read jacob! And the perf fix for completions that you both did really helps, thanks for that. I'm writing so many comments now 🙂