Fork me on GitHub

hey everyone, here is a little contribution from me. A way to parse Clojure(script) code safely. I hope you like it 🙂

👍 28
aw_yeah 4

What are you using it for?


@U09LZR36F so far I am not using it. But I was hoping it would be useful to people out there wanting to parse clojure but not doing it since read-string is not safe and edn/read cannot parse clojure code. I have seen a couple of projects die out due to that so I thought I would help a little 😅


I may try and reimplement something of mine based on this. Is it a goal to have perfectly bidirectional snippets?


rewrite-clj has a similar goal

borkdude14:10:04 is used to parse code in sci/babashka (it skips intermediate representation and parses directly to code, bidirectionality is not a goal).


Yeah, current implementation is rewrite but I found it difficult to work with for reasons you mentioned :)


(Eg metadata)


user=> (parcera/clojure "(defn foo [])")
[:code [:whitespace ""] [:list [:whitespace ""] [:symbol "defn"] [:whitespace " "] [:whitespace ""] [:symbol "foo"] [:whitespace " "] [:whitespace ""] [:vector] [:whitespace ""]] [:whitespace ""]]
I see a couple of empty whitespace nodes there.


kudos for cljc support!


it seems instaparse supports .cljc. maybe parcera could also support a CLJS API


overall impressive work, thanks for sharing @U0LJU20SJ

🙂 4

Super nice stuff!


> Is it a goal to have perfectly bidirectional snippets? @U09LZR36F yes, that is a goal. You can check the test cases and see that currently it is possible to parse and stringify both clojure.core and cljs.core


That's awesome


Might break the keyboard out now


> I see a couple of empty whitespace nodes there @U04V15CAJ yeah, unfortunately I have not been able to get rid of those. The problem is that instaparse becomes too slow if I always checks for optional whitespace. So instead I made it mandatory and allowed it to be empty. Maybe in the future I can get rid of this but so far it doesnt seem to hurt and it probably would be easier to just provide a walk function which removes unwanted nodes 😉


how is performance overall?


I measured instaparse a while ago against j.u.regex, and it was several orders of magnitude slower.


I wasn't using instaparse's regex support though, so it might be faster when not using it


(Sorry, not a useful answer)


> it seems instaparse supports .cljc. maybe parcera could also support a CLJS API @U04V15CAJ I would need a bit of support there since I dont know of any “string building” api in Js. That is so far my the only show stopper since str was quite slow for big input


@U0LJU20SJ I think the API for goog StringBuffer is more or less the same as StringBuilder


> how is performance overall? @U04V15CAJ as @U09LZR36F mentioned, performance is not the best compared to other tools out there. A roundtrip of clojure.core takes around 6 seconds (in my machine) … around 8k lines


That's fairly reasonable I'd say


is that only the source dir? clj-kondo can lint clojure src in 2 seconds. so I'm holding off on swapping to Instaparse-based for now 🙂


I think a specialized instaparse-like thing could be created.


wait, this includes writing of course. what about only the parsing @U0LJU20SJ?


We could fire this up in a repl and find out... :p

😉 4

I'm cooking dinner right now, will you be my REPL?


@U09LZR36F you are right on the spot. Performance varies depending on the machine so I can not promise anything 😅 . However, I just check and parsing takes most of the time …. 5.5 seconds


on a cheap macbook air: linting took 1903ms, errors: 34, warnings: 387


it's not that bad though 5.5 seconds and can be very useful for tools that don't need to be super fast for editor support


I bet this grammar of yours can also be used to generate a Java parser that could be faster


using ANTLR maybe


yeah, probably. Although I would also like to know how user friendly they are. Instaparse can point out things like ambiguities, failure position, metadata on each node. I think those are incredible tools and would prefer to make Instaparse faster than to rewrite the complete thing in a less user friendly tool


instaparse is considered quite fast for what it is I think 🙂


@U09LZR36F fwiw, parcera has the same metadata issue I had with rewrite-clj:

user=> (parcera/clojure "^:private []")
[:code [:whitespace ""] [:metadata [:simple-keyword "private"] [:whitespace " "] [:vector] [:whitespace ""]] [:whitespace ""]]


Oh man, that's a shame.


it makes sense from a parsing->writing point of view but from a code analyzing point of view it's a bit unhandy


I wonder if this form is easier to work with for re-shaping though


well, I just changed clj-kondo's "vendored" rewrite-clj. also I stripped out all the whitespace because it's just noise to clj-kondo


I quite want all the whitespace 😛 I want a whitespace-preserving config/edn rewriter.


do you use metadata in config a lot?


Not really, no. But I don't want to break what others are doing.


> fwiw, parcera has the same metadata issue I had with rewrite-clj: @U04V15CAJ what issue is that :thinking_face: ?


it's about convenience for parsing, consider this example:

(parcera/clojure "{^:x :a,1}")
;; =>
 [:whitespace ""]
   [:whitespace ""]
   [:metadata [:simple-keyword "x"] [:whitespace " "] [:simple-keyword "a"] [:whitespace ","]]
   [:whitespace ""]
   [:whitespace ""]
   [:number "1"]
   [:whitespace ""]]]
 [:whitespace ""]]
You have to unwrap every node to determine whether it's metadata or not in order to get to the real value.


The common desire is to access the real value


@U0LJU20SJ for example you're analyzing this function: (defn ^String foo [^String x ^Long x] ...). When walking over the nodes, I generally don't want to check if a node is metadata containing some other value and then pull the value I'm actually interested out. I just want to work with those values and if I'm interested in metadata, look at that optionally.


@U0LJU20SJ don't get me wrong, what you made is super cool. it's just something I noticed when working with rewrite-clj.


yeah, you are right. However we have conflicting goals. My goal was to guarantee bidirectional snippets and to optionally allow for people to only “see” a part of the complete parsing process. If I dont include those nodes there then I could not make a full roundtrip without losing information. My idea was that such things can be easily done with things like zippers and walk which can automatically remove them without having to actually check for it manually on an implementation


btw, rewrite-clj has a companion zipper namespace for rewriting. could also be nice for this lib, although you could maybe do it with normal clojure zippers


> don’t get me wrong, what you made is super cool. no worries, all feedback is welcome. I was also thinking of putting some “node removal” functions in parcera but I am not sure if this should be in the core :thinking_face:


@U0LJU20SJ true, conflicting goals indeed


the way I solved it in my fork of rewrite-clj is to lift metadata nodes as real metadata on the value nodes. you could check for that on the way back when writing out.


Btw, the benchmark of 5.5 was this just "core.clj" or entire Clojure src? I assumed entire Clojure src


nope, it was just core.clj


Ah I see. I think if you would rewrite this using tools.reader it could be orders of magnitude faster, if that’s a goal


But then you’re basically doing the same as rewrite-clj


> But then you’re basically doing the same as rewrite-clj yeap, exactly. That is not a goal. rewrite-clj already does that so no need to do the same. That is what I meant before with “I prefer to make instaparse faster than to completely rewrite this just to make it faster”


parcera is still in its infancy … with time we will see which path it takes 😉

Alex Miller (Clojure team)17:10:33

Btw there was an antlr grammar in Clojure long long ago, before the LispReader existed


oh really ? what happened with it ? I find it quite nice to have a grammar to look at as “source of truth” for a language. From what I saw in the clojure compiler it seems like a “one pass does all” kind of approach …. I guess due to performance ?


  (m/rewrite (parcera/clojure "(-> a b c)")
             [?tag . !rest-pre ... . [:list . (m/pred whitespace?) ... . [:symbol "->"] . (m/or (m/pred whitespace?) [:symbol _ :as !args]) ...] . !rest-post ...]
             (m/with [%list [:list !args [:whitespace " "] . %list ...]]
               [?tag . !rest-pre ... . %list ... . !rest-post ...])

             [?tag . (m/cata !content) ...]
             [?tag . !content ...]

I wrote (with a lot of help from #meander) a little tool for rewriting (-> a b c) to (a (b (c))) 🙂 Very handy

🚀 8
Alex Miller (Clojure team)23:10:23

@U0LJU20SJ was replaced with the LispReader