Fork me on GitHub

I'm just doing some profiling in vim, and a lot of time goes to finding the completion context. How are other editors doing this fast? I'm considering whether I should just send the whole buffer or the last 400/next 400 lines or something like that.


Dumb idea of just dumping the whole file doesn't work because compliment only reads a single form 🙂

😂 1

Calva does something similar. Take the text from start of top level form to in front on what is completed, take the text from end of what is completed to end of top level form, concatenate. Where is the time spend in the above snippet?


On finding the opening paren. In bigger functions this is pretty slow.


Is it slow as in that the user notices a delay?


Calva uses its Token Cursor for all things like this. So finding top level form is a super quick operation, because it is just to go downList() until you can’t anymore, then the top level form is the same as the current form. If you have something like a token cursor, or zipper, or whatever, then probably that would be quick. OTH, maybe searchpairpos() uses something like that already?


Sorry, missed this. It's so slow I noticed delay while trying to work and just disabled everything. I felt like I was using vi 🙂


I'm not sure what a Token Cursor is, but my suspicion is that you've got some kind of caching/incremental parsing going on there. Vim isn't that smart. It has to search the whole file every time to figure out if there's a cheeky ( at the start of the file to match.


Benchmarks indicate a reduction of 59ms to 0.26ms per call.


Apparently a 99.56% decrease 😄

metal 1

A Token Cursor is an old invention for navigating LISP code. Here’s Up List from an 80s zmacs manual: > Moves forward out of nesting levels. It moves forward one level of list structure. It searches for a close parenthesis and leaves point to the right of that close parenthesis. Also, if called inside of a string, it moves up out of that string, leaving point to the right of its ending quote. In Calva we maintain a sequence of tokens for all documents that are being edited. And then we use a structural “cursor” to navigate the code. Totally inspired by that age old invention.


When you reached for TreeSitter you took a similar approach, I would say. Our token sequence is comparable to the AST that TreeSitter builds and uses.


Interestingly, the same manual says this about top level expressions: > A Lisp file contains a sequence of expressions that we call top-level expressions, to distinguish them from their own sUbexpressions. Zmacs assumes that top-level expressions begin with an open parenthesis against the left margin. It does not parse top-level expressions by balancing parentheses, since parentheses do not always balance while programs are being written. The indentation represents the programmer’s conception of program structure, and provides a better guide. So by top-level expression, we mean a section of text delimited by open parentheses at the beginning of two lines.


Calva instead helps with keeping the document always balanced, and when it is not offers almost no help for editing it. 😃


Ah, so it just searches backwards for ^( and then forwards for the same...


Unfortunately, compliment only works on complete forms anyway 🙂


I might put forward that suggestion to tpope from the zmacs manual and see what he thinks.


I’m not sure how it was done in zmacs and its precursors, but Calva maintains a parsed representation of the document that is fast to navigate.


Does VSCode provide tools for assisting with maintenance of that parsed representation?


I can't imagine writing that myself!


I wish. And it is pretty advanced tech, for sure. I got help creating the infrastructure from a developer who does such things in the sleep.


Nice, I'll have to poke about some time.


> Vim isn't that smart.  It has to search the whole file every time to figure out if there's a cheeky `(` at the start of the file to match. I got bit by the same thing (in a different context) in the thing I'm making for Sublime Text. I solved it by just assuming that if ( is at the first column, it's the outermost form. 🤷 It's not perfect, but it probably solves 99.9% of the cases. 😛


That’s how zmacs solved it too. 😃 But for a different reason.


Oh yes, I see that now in the quote you pasted earlier. 🙂


To be clear. VS Code is not helping much with this for Calva. We have added it with the help of only the onDocumentContentChanged events (which are also very bare and could be much more helpful, but there it is). We maintain a “mirror” document, which gets updated whenever the vs code document gets updated, and vice versa. This mirror document is parsed and we maintain the an array of tokens which we then use for navigating the document. The code for finding the top level form looks like so:

rangeForDefun(offset: number, commentCreatesTopLevel = true): [number, number] {
        const cursor = this.doc.getTokenCursor(offset);
        let lastCandidateRange: [number, number] = cursor.rangeForCurrentForm(offset);
        while (cursor.forwardList() && cursor.upList()) {
            const commentCursor = cursor.clone();
            if (!commentCreatesTopLevel || commentCursor.getToken().raw !== ')' || commentCursor.getFunctionName() !== 'comment') {
                lastCandidateRange = cursor.rangeForCurrentForm(cursor.offsetStart);
        return lastCandidateRange;
Including the handling of rich comment top level forms.


It’s not only used for navigation. All structural edits in Calva are performed on this mirror document which is then mirrored to the “real” document.


Goodness. That does sound sophisticated. I'm basically just moving parens around. 🤪 And it's not too expensive to maintain that "mirror" document in the background?


Because LISP this cursor thing is also basically just moving parens around. Memorywise, every document consumes at least double (text + token array). And maybe more, I don’t know how smart VS Code is about in-active documents, but Calva keeps the mirror around. That doesn’t seem to be too expensive, though. Parsing the documents can be quite a lot of work, but still below 200ms for Clojure core. That could get a lot more efficient if we used DFA regexes, but so far I have resisted going down that rabbit hole. Traversing the token array can also get expensive if the forms are very large. Calva hates huge EDN structures. But they need to get pretty huge first. Calva’s rainbow parens and other highlight services built on this take care only to traverse forms that are in view, but with a huge EDN form, that doesn’t help much.


I’ve been trying to understand this article a bit, to see if I could use some of the approach, but so far it mostly passes over my head.


Interesting, thanks for the info!


And yeah, I glanced at that bracket pair colorization article earlier, but, well, I'm an English major, so... 😛

😂 1

I ended up rewriting it using treesitter/lua and now it doesn't even show up in the profile it's so fast.... Oh yeah!

metal 1

That’s a similar path as Calva uses, even if we are using a homegrown syntax parser.