rewrite-clj

skylize 2022-09-29T15:47:47.868889Z

Current behavior of the rewrite-clj's reader is making editor tooling unnecessarily difficult. This discussion started at least as far back as January 2021 https://github.com/clj-commons/rewrite-clj/issues/87, though I wasn't around for that. I have recently stumbled over numerous warts using Calva, where reader exceptions bubble their way up from rewrite-clj through clojure-lsp, cljfmt, or some other route, all the way to the user as a thrown-exception popup. When this happens, useful help for linting, formatting, etc. are all prety much useless until the code is edited back into a "readable" state. From a user perspective, all these errors seem random and arbitrary. It significantly degrades the user experience, and misses out on opportunities for tools to target help for fixing broken code. The examples shown in the thread are minimal cases, but occur easily and frequently inside or around much larger blocks of code, where the "cause" can be quite difficult to discover. Some no longer throw, but only because they are now patched over with a hack to swallow the exception. As noted by @ericdallo (edit in brackets to say clojure-lsp where he accidentally said rewrite-clj) > we've been hacking that on kondo and [clojure-lsp], we should probably have that on rewrite behind some setting or flag > https://clojurians.slack.com/archives/CPABC1H61/p1664463806428379?thread_ts=1663954653.160439&cid=CPABC1H61 It seems to me the reader needs to be augmented with error recovery, to look forward or backward from the point of failure in search of a place where lexing can continue. Then report errors alongside all the valid nodes (hopefully with sufficient data that a tool like clj-kondo can make a good guess at what went wrong).

💡 1
pez 2022-09-30T07:03:45.751999Z

> parinfer could possibly be used to guess where unbalanced braces should be placed Not sure, but Parinfer can only infer parens on correctly indented code, right?

snoe 2022-09-30T16:40:16.285499Z

>> parinfer could possibly be used to guess where unbalanced braces should be placed > Not sure, but Parinfer can only infer parens on correctly indented code, right? Mostly, but when I made the vim plugin, I would run paren mode to correct indentation based on parens first, then turn on infer mode that would move indentation without changing structure. I think smart mode was added later to do something similar. In my experience, at least top level forms start on col 0, so if you split the text at those points, you could possibly try parsing each chunk until you hit one you can't read then wrap it in a [:inferred (infer chunk)] node, and continue with the rest of the text. That way, linters could signal that's something wrong, but analysis or whatever could still be applied internally to that inferred node. The goal would be to read as much of the file as possible normally, and infer only that one chunk in the middle that missed a closing paren.

lread 2022-09-30T16:49:15.623749Z

To mull: One thing that rewrite-clj does now (with the exception of normalizing all OS newline variants to \n) is spit out exactly what it reads in. It think it will have to do the same even in the unbalanced delimiters scenario. I also am not sure what this means for formatters like zprint, cljfmt and cljstyle. I would think they’d probably be fine if rewrite-clj started to not barf on invalid symbols and keywords, but not sure if they want to deal with code with unbalanced delimiters.

💯 1
skylize 2022-09-29T15:49:16.412389Z

> • Type into document such that {:foo :bar} becomes {#_#:foo :bar}. > • See error pop up: Request textDocument/codeAction failed. > https://github.com/clj-commons/rewrite-clj/issues/87 > Repro: (`|` represents cursor) > clojure-lsp version used: 2022.09.20-16.33.44-nightly >

{:foo b|}
> Type <backspace> to trigger. > https://clojurians.slack.com/archives/CPABC1H61/p1663701774338309 >
()#_() 
> With cursor anywhere except between the first 2 parens. > Then use mouse or arrow keys to place cursor inside uncommented form to trigger. >
(|)#_()
> https://clojurians.slack.com/archives/CPABC1H61/p1663884706635019 >
({:foo}) ;; 💣
> Cursor anywhere except after the entire form to throw exception. > https://clojurians.slack.com/archives/CPABC1H61/p1663954653160439 >
(/foo :bar)
> Here /foo is an invalid symbol, as indicated by the linter Invalid symbol: /foo. Okay. > > But after command Barf Forward >
(/foo :bar)    -->    (/foo) :bar
> and again after Slurp Forward on that result, >
(/foo) :bar    -->    (/foo :bar)
> the error emits as a popup. > https://clojurians.slack.com/archives/CBE668G4R/p1663863858149529

lread 2022-09-29T15:58:11.986839Z

Thanks for investing all this time to summarize your findings on this topic @eveningsky.

☺️ 1
lread 2022-09-29T16:04:28.124169Z

With https://cljdoc.org/d/rewrite-clj/rewrite-clj/1.1.45/doc/user-guide#_parsing_peculiarities, rewrite-clj currently only parses valid Clojure code, but that’s at all saying that we won’t handle invalid code in the future. We would just have to invest the time in a solution that makes sense.

👍 1
lread 2022-09-29T16:04:40.576759Z

We might be able to steal some strategies from clj-kondo’s internalized and customized version of rewrite-clj. Not sure exactly how far it goes. It must at least deal with invalid symbols, but maybe not unbalanced braces, etc.

snoe 2022-09-29T16:11:20.566689Z

parinfer could possibly be used to guess where unbalanced braces should be placed and allow delineation of top level forms

lread 2022-09-29T16:12:44.193699Z

interesting idea @snoe, deciding when and how to recover from unbalanced braces can be tricky.

snoe 2022-09-29T16:13:25.182309Z

yeah, but just identifying unreadable tokens similar to uneval nodes might be a great step

lread 2022-09-29T16:13:56.512629Z

True, we don’t have to do it all at once. Could tackle the easier bits first.

lread 2022-09-29T16:16:05.742209Z

@eveningsky for your {:foo b|} example, rewrite-clj happens to support unbalanced maps today, so there might be a workable solution for that specific case. To demonstrate:

(require '[rewrite-clj.zip :as z])

(-> "{:foo b}"
    z/of-string
    z/down
    z/right
    z/remove
    z/root-string)
;; => "{:foo}"

lread 2022-09-29T16:19:29.060079Z

For ()#_() are you modifying anything? (you probably are, lemme know) Rewrite-clj can read and write this structure:

(-> "()#_()"
    z/of-string
    z/root-string)
;; => "()#_()"

lread 2022-09-29T16:21:06.623269Z

Same question for:

(-> "({:foo}) ;; 💣"
    z/of-string
    z/root-string)
;; => "({:foo}) ;; 💣"

lread 2022-09-29T16:22:14.234979Z

I do see your /foo issue:

(-> "(/foo :bar)"
    z/of-string)
;; => ExceptionInfo Invalid symbol: /foo.  clojure.core/ex-info (core.clj:4617)

lread 2022-09-29T16:26:42.387179Z

@ericdallo when you have a moment, please feel free to chime in on above too.

ericdallo 2022-09-29T16:50:48.685509Z

I see lots of possibilities here, so maybe we could prioritize some of the most common invalid codes https://github.com/clojure-lsp/clojure-lsp/blob/master/lib/src/clojure_lsp/parser.clj#L46-L99 is where clojure-lsp do its custom ugly parse 😅 @borkdude may be interested on that as well.

skylize 2022-09-29T16:50:57.442629Z

> or ()#_() are you modifying anything? When examples say "place cursor ...", there is no further editing required. The errors will trigger just by moving the cursor into regions specified. (Not clear to me how that ties in with what rewrite is doing. But when I asked @ericdallo, he indicated this was still more-or-less the same issue as the /foo problem I discussed with you.) Check this out for significant elaboration on those 2 examples. https://github.com/clojure-lsp/clojure-lsp/issues/1268

lread 2022-09-29T16:58:18.132909Z

@eveningsky thanks, watcha think @ericdallo? I don’t yet see rewrite-clj as being the problem for some examples I showed to work with rewrite-clj above. But… I totally fine with being wrong about that.

lread 2022-09-29T17:02:41.416829Z

There’s some good ideas and initial discussion in https://github.com/clj-commons/rewrite-clj/issues/87, I’ll study that more carefully soon to freshen my brain on this topic.

pez 2022-09-29T18:22:43.475669Z

It could be that rewrite-clj should be split into two different libraries. I've totally not thought this through, it's just something that strikes me. Some editor tooling needs quite different help from what other tooling needs in terms of Clojure source code reading and manipulation. While typing, about anything should be accepted. Other use cases wants stricter and more semantic help. Is it the same library that should cater to both?

snoe 2022-09-29T18:30:42.935699Z

So, personally I feel that rewrite-clj was always a text parser. If we're talking about transforming sexps tools.reader would have been sufficient. But we want to maintain whitespace and comments; things outside sexps. unreadable errors in source code happen and, for me, it would fit well in rewrite-clj. If you look at the biggest use cases: formatting, analysis, refactoring i think they would all benefit from error handling.

borkdude 2022-09-29T18:37:18.088129Z

I haven't read all of the conversation here (having a day off), but every time someone took the effort to report a case where clj-kondo should continue parsing with invalid code instead of throwing, it's been solved. So please keep doing that.

1
lread 2022-09-29T19:27:17.987879Z

I don’t think we need to split rewrite-clj to two different libraries. We can figure out something sensible.

pez 2022-09-29T19:32:26.229519Z

Haha, yeah, maybe it is non-sensible. 😃

borkdude 2022-09-29T19:33:58.271289Z

in clj-kondo I vendored both rewrite-clj and tools.reader upon which it depends to deal with these problems

borkdude 2022-09-29T19:34:11.084169Z

(also to get better error messages especially from tools.reader)

lread 2022-09-29T19:34:14.339899Z

Oh no, didn’t at all mean your idea was not appreciated @pez! We are at the brainstorming stage and all input is great.

❤️ 1
borkdude 2022-09-29T19:34:20.080279Z

this was before rewrite-clj v1 was out

lread 2022-09-29T19:34:54.676759Z

Yeah, I had to futz tools.reader a bit to deal with line endings, but did not inline. Yet.

borkdude 2022-09-29T19:35:34.936829Z

you might as well use edamame and propose a feature request there if you need something weird ;)

lread 2022-09-29T19:35:58.279559Z

See, all ideas are good! simple_smile

borkdude 2022-09-29T19:36:11.572239Z

I think it has the similar better error messages that the inlined tools.reader has

lread 2022-09-29T19:36:43.627149Z

It’s an option.

borkdude 2022-09-29T19:36:51.308149Z

perhaps one day I could undo the vendoring if we go that way :-D

🤞 1
borkdude 2022-09-29T19:37:26.110039Z

I also vendored rewrite-clj since it was a bit unmaintained and I didn't want to be limited by that

borkdude 2022-09-29T19:37:41.157219Z

but this is no longer true

lread 2022-09-29T19:38:18.210149Z

Yeah but you had some tweaks too. Like skipping all whitespace for perf, and treating meta differently I think.

borkdude 2022-09-29T19:39:17.262009Z

Yes. The skipping of whitespace could be easily an option in rewrite-clj though

lread 2022-09-29T19:39:24.823269Z

True, true.

borkdude 2022-09-29T19:40:01.380239Z

I mean, it doesn't really fit with the "rewrite" use case, but the tweak is very minimal

👍 1
lread 2022-09-29T19:40:22.132349Z

When borkdude takes a day off, how many borks join him? Do you leave a few borks behind to keep working?

😂 1
borkdude 2022-09-29T19:40:38.975329Z

I'm on the train home now... catching up ;)

lread 2022-09-29T19:41:27.944139Z

Which bork am I talking to? I never know. Maybe it really doesn’t matter.

😂 1
pez 2022-09-29T19:44:34.461919Z

I only use rewrite-clj by proxy, so am actually not very familiar with it or informed about the boundaries and stuff it operates within.

lread 2022-09-29T19:45:32.067609Z

Indirectly through clojure-lsp, clj-kondo, zprint (and formerly cljfmt)?

pez 2022-09-29T19:45:53.108649Z

Yes, Those.

pez 2022-09-29T19:46:11.111249Z

Formerly cljfmt?

lread 2022-09-29T19:46:52.335259Z

I thought you might have switched to zprint? I dunno, maybe a false rumour.

pez 2022-09-29T19:47:36.142209Z

Ah, I thought you was just telling me cljfmt had switched to something else. 😃

lread 2022-09-29T19:48:28.164769Z

Oh yes, (I mean no, cljfmt still uses rewrite-clj), communication, eh? Can be tricky. simple_smile

pez 2022-09-29T19:48:34.736069Z

Calva is very much still using cljfmt. And also zprint.

👍 1
pez 2022-09-29T19:49:06.406099Z

cljfmt for formatting, and zprint for pretty printing

lread 2022-09-29T19:49:30.402949Z

Interesting.

lread 2022-09-29T19:50:00.054689Z

I know that zprint has many more knobs to turn, but have not ever really dug in to learn it.

pez 2022-09-29T19:50:07.140679Z

I want to support zprint for formatting as well, but it gets complicated.

pez 2022-09-29T19:50:38.201339Z

Maybe some day. 😃

lread 2022-09-29T19:50:52.611809Z

Maybe tomorrow, maybe never. simple_smile

pez 2022-09-29T19:51:12.370799Z

Haha. something like that.

lread 2022-09-29T19:51:17.579339Z

(an expression I learned when visiting Mexico many years ago)

pez 2022-09-29T19:52:01.292119Z

I'll be stealing it.

lread 2022-09-29T19:52:28.137819Z

It’s a good one, you definitely should.