This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2019-10-06
Channels
- # announcements (69)
- # aws-lambda (3)
- # babashka (45)
- # beginners (28)
- # calva (4)
- # clara (7)
- # clojure (23)
- # clojure-spec (5)
- # clojure-uk (18)
- # clojurescript (57)
- # clojutre (1)
- # cursive (20)
- # datomic (31)
- # emacs (5)
- # figwheel-main (3)
- # fulcro (16)
- # graalvm (7)
- # luminus (4)
- # nrepl (9)
- # off-topic (50)
- # re-frame (8)
- # reitit (2)
- # rewrite-clj (10)
- # shadow-cljs (88)
- # spacemacs (1)
- # sql (6)
- # vim (2)
hey everyone, here is a little contribution from me. A way to parse Clojure(script) code safely. I hope you like it 🙂 https://github.com/carocad/parcera

@U09LZR36F so far I am not using it. But I was hoping it would be useful to people out there wanting to parse clojure but not doing it since read-string
is not safe and edn/read
cannot parse clojure code. I have seen a couple of projects die out due to that so I thought I would help a little 😅
I may try and reimplement something of mine based on this. Is it a goal to have perfectly bidirectional snippets?
https://github.com/borkdude/edamame is used to parse code in sci/babashka (it skips intermediate representation and parses directly to code, bidirectionality is not a goal).
Yeah, current implementation is rewrite but I found it difficult to work with for reasons you mentioned :)
user=> (parcera/clojure "(defn foo [])")
[:code [:whitespace ""] [:list [:whitespace ""] [:symbol "defn"] [:whitespace " "] [:whitespace ""] [:symbol "foo"] [:whitespace " "] [:whitespace ""] [:vector] [:whitespace ""]] [:whitespace ""]]
I see a couple of empty whitespace nodes there.> Is it a goal to have perfectly bidirectional snippets?
@U09LZR36F yes, that is a goal. You can check the test cases and see that currently it is possible to parse and stringify both clojure.core
and cljs.core
> I see a couple of empty whitespace nodes there
@U04V15CAJ yeah, unfortunately I have not been able to get rid of those. The problem is that instaparse becomes too slow if I always checks for optional whitespace. So instead I made it mandatory and allowed it to be empty. Maybe in the future I can get rid of this but so far it doesnt seem to hurt and it probably would be easier to just provide a walk
function which removes unwanted nodes 😉
I measured instaparse a while ago against j.u.regex, and it was several orders of magnitude slower.
I wasn't using instaparse's regex support though, so it might be faster when not using it
> it seems instaparse supports .cljc. maybe parcera could also support a CLJS API
@U04V15CAJ I would need a bit of support there since I dont know of any “string building” api in Js. That is so far my the only show stopper since str
was quite slow for big input
@U0LJU20SJ I think the API for goog StringBuffer is more or less the same as StringBuilder
> how is performance overall?
@U04V15CAJ as @U09LZR36F mentioned, performance is not the best compared to other tools out there. A roundtrip of clojure.core
takes around 6 seconds (in my machine) … around 8k lines
is that only the source dir? clj-kondo can lint clojure src in 2 seconds. so I'm holding off on swapping to Instaparse-based for now 🙂
wait, this includes writing of course. what about only the parsing @U0LJU20SJ?
@U09LZR36F you are right on the spot. Performance varies depending on the machine so I can not promise anything 😅 . However, I just check and parsing takes most of the time …. 5.5 seconds
it's not that bad though 5.5 seconds and can be very useful for tools that don't need to be super fast for editor support
I bet this grammar of yours can also be used to generate a Java parser that could be faster
yeah, probably. Although I would also like to know how user friendly they are. Instaparse can point out things like ambiguities, failure position, metadata on each node. I think those are incredible tools and would prefer to make Instaparse faster than to rewrite the complete thing in a less user friendly tool
@U09LZR36F fwiw, parcera has the same metadata issue I had with rewrite-clj:
user=> (parcera/clojure "^:private []")
[:code [:whitespace ""] [:metadata [:simple-keyword "private"] [:whitespace " "] [:vector] [:whitespace ""]] [:whitespace ""]]
it makes sense from a parsing->writing point of view but from a code analyzing point of view it's a bit unhandy
well, I just changed clj-kondo's "vendored" rewrite-clj. also I stripped out all the whitespace because it's just noise to clj-kondo
I quite want all the whitespace 😛 I want a whitespace-preserving config/edn rewriter.
> fwiw, parcera has the same metadata issue I had with rewrite-clj: @U04V15CAJ what issue is that :thinking_face: ?
it's about convenience for parsing, consider this example:
(parcera/clojure "{^:x :a,1}")
;; =>
[:code
[:whitespace ""]
[:map
[:map-content
[:whitespace ""]
[:metadata [:simple-keyword "x"] [:whitespace " "] [:simple-keyword "a"] [:whitespace ","]]
[:whitespace ""]
[:whitespace ""]
[:number "1"]
[:whitespace ""]]]
[:whitespace ""]]
You have to unwrap every node to determine whether it's metadata or not in order to get to the real value.@U0LJU20SJ for example you're analyzing this function: (defn ^String foo [^String x ^Long x] ...)
. When walking over the nodes, I generally don't want to check if a node is metadata containing some other value and then pull the value I'm actually interested out. I just want to work with those values and if I'm interested in metadata, look at that optionally.
@U0LJU20SJ don't get me wrong, what you made is super cool. it's just something I noticed when working with rewrite-clj.
yeah, you are right. However we have conflicting goals. My goal was to guarantee bidirectional snippets and to optionally allow for people to only “see” a part of the complete parsing process. If I dont include those nodes there then I could not make a full roundtrip without losing information. My idea was that such things can be easily done with things like zippers
and walk
which can automatically remove them without having to actually check for it manually on an implementation
btw, rewrite-clj has a companion zipper namespace for rewriting. could also be nice for this lib, although you could maybe do it with normal clojure zippers
> don’t get me wrong, what you made is super cool.
no worries, all feedback is welcome. I was also thinking of putting some “node removal” functions in parcera
but I am not sure if this should be in the core :thinking_face:
@U0LJU20SJ true, conflicting goals indeed
the way I solved it in my fork of rewrite-clj is to lift metadata nodes as real metadata on the value nodes. you could check for that on the way back when writing out.
Btw, the benchmark of 5.5 was this just "core.clj" or entire Clojure src? I assumed entire Clojure src
Ah I see. I think if you would rewrite this using tools.reader it could be orders of magnitude faster, if that’s a goal
> But then you’re basically doing the same as rewrite-clj yeap, exactly. That is not a goal. rewrite-clj already does that so no need to do the same. That is what I meant before with “I prefer to make instaparse faster than to completely rewrite this just to make it faster”
Btw there was an antlr grammar in Clojure long long ago, before the LispReader existed
oh really ? what happened with it ? I find it quite nice to have a grammar to look at as “source of truth” for a language. From what I saw in the clojure compiler it seems like a “one pass does all” kind of approach …. I guess due to performance ?
(parcera/code
(m/rewrite (parcera/clojure "(-> a b c)")
[?tag . !rest-pre ... . [:list . (m/pred whitespace?) ... . [:symbol "->"] . (m/or (m/pred whitespace?) [:symbol _ :as !args]) ...] . !rest-post ...]
(m/with [%list [:list !args [:whitespace " "] . %list ...]]
[?tag . !rest-pre ... . %list ... . !rest-post ...])
[?tag . (m/cata !content) ...]
[?tag . !content ...]
?x
?x))
I wrote (with a lot of help from #meander) a little tool for rewriting (-> a b c)
to (a (b (c)))
🙂 Very handy@U0LJU20SJ was replaced with the LispReader
InsideClojure journal 2019.23 http://insideclojure.org/2019/10/06/journal/