Fork me on GitHub
#announcements
<
2019-10-06
>
carocad14:10:53

hey everyone, here is a little contribution from me. A way to parse Clojure(script) code safely. I hope you like it 🙂 https://github.com/carocad/parcera

carocad14:10:53

hey everyone, here is a little contribution from me. A way to parse Clojure(script) code safely. I hope you like it 🙂 https://github.com/carocad/parcera

dominicm14:10:42

What are you using it for?

carocad14:10:04

@ so far I am not using it. But I was hoping it would be useful to people out there wanting to parse clojure but not doing it since read-string is not safe and edn/read cannot parse clojure code. I have seen a couple of projects die out due to that so I thought I would help a little 😅

dominicm14:10:17

I may try and reimplement something of mine based on this. Is it a goal to have perfectly bidirectional snippets?

borkdude14:10:29

rewrite-clj has a similar goal

borkdude14:10:04

https://github.com/borkdude/edamame is used to parse code in sci/babashka (it skips intermediate representation and parses directly to code, bidirectionality is not a goal).

dominicm14:10:11

Yeah, current implementation is rewrite but I found it difficult to work with for reasons you mentioned :)

dominicm14:10:35

(Eg metadata)

borkdude14:10:58

user=> (parcera/clojure "(defn foo [])")
[:code [:whitespace ""] [:list [:whitespace ""] [:symbol "defn"] [:whitespace " "] [:whitespace ""] [:symbol "foo"] [:whitespace " "] [:whitespace ""] [:vector] [:whitespace ""]] [:whitespace ""]]
I see a couple of empty whitespace nodes there.

borkdude15:10:12

kudos for cljc support!

borkdude15:10:12

it seems instaparse supports .cljc. maybe parcera could also support a CLJS API

borkdude15:10:19

overall impressive work, thanks for sharing @

pez15:10:22

Super nice stuff!

carocad15:10:31

> Is it a goal to have perfectly bidirectional snippets? @ yes, that is a goal. You can check the test cases and see that currently it is possible to parse and stringify both clojure.core and cljs.core

dominicm15:10:01

That's awesome

dominicm15:10:10

Might break the keyboard out now

carocad15:10:45

> I see a couple of empty whitespace nodes there @ yeah, unfortunately I have not been able to get rid of those. The problem is that instaparse becomes too slow if I always checks for optional whitespace. So instead I made it mandatory and allowed it to be empty. Maybe in the future I can get rid of this but so far it doesnt seem to hurt and it probably would be easier to just provide a walk function which removes unwanted nodes 😉

borkdude15:10:16

how is performance overall?

dominicm15:10:14

I measured instaparse a while ago against j.u.regex, and it was several orders of magnitude slower.

dominicm15:10:45

I wasn't using instaparse's regex support though, so it might be faster when not using it

dominicm15:10:54

(Sorry, not a useful answer)

carocad15:10:57

> it seems instaparse supports .cljc. maybe parcera could also support a CLJS API @ I would need a bit of support there since I dont know of any “string building” api in Js. That is so far my the only show stopper since str was quite slow for big input

borkdude15:10:56

@ I think the API for goog StringBuffer is more or less the same as StringBuilder

carocad15:10:12

> how is performance overall? @ as @ mentioned, performance is not the best compared to other tools out there. A roundtrip of clojure.core takes around 6 seconds (in my machine) … around 8k lines

dominicm15:10:47

That's fairly reasonable I'd say

borkdude15:10:39

is that only the source dir? clj-kondo can lint clojure src in 2 seconds. so I'm holding off on swapping to Instaparse-based for now 🙂

dominicm15:10:21

I think a specialized instaparse-like thing could be created.

borkdude15:10:45

wait, this includes writing of course. what about only the parsing @?

dominicm15:10:11

We could fire this up in a repl and find out... :p

borkdude15:10:22

I'm cooking dinner right now, will you be my REPL?

carocad15:10:11

@ you are right on the spot. Performance varies depending on the machine so I can not promise anything 😅 . However, I just check and parsing takes most of the time …. 5.5 seconds

borkdude15:10:46

on a cheap macbook air: linting took 1903ms, errors: 34, warnings: 387

borkdude15:10:25

it's not that bad though 5.5 seconds and can be very useful for tools that don't need to be super fast for editor support

borkdude16:10:31

I bet this grammar of yours can also be used to generate a Java parser that could be faster

borkdude16:10:50

using ANTLR maybe

carocad16:10:59

yeah, probably. Although I would also like to know how user friendly they are. Instaparse can point out things like ambiguities, failure position, metadata on each node. I think those are incredible tools and would prefer to make Instaparse faster than to rewrite the complete thing in a less user friendly tool

dominicm16:10:09

instaparse is considered quite fast for what it is I think 🙂

borkdude16:10:50

@ fwiw, parcera has the same metadata issue I had with rewrite-clj:

user=> (parcera/clojure "^:private []")
[:code [:whitespace ""] [:metadata [:simple-keyword "private"] [:whitespace " "] [:vector] [:whitespace ""]] [:whitespace ""]]

dominicm16:10:07

Oh man, that's a shame.

borkdude16:10:24

it makes sense from a parsing->writing point of view but from a code analyzing point of view it's a bit unhandy

dominicm16:10:26

I wonder if this form is easier to work with for re-shaping though

borkdude16:10:05

well, I just changed clj-kondo's "vendored" rewrite-clj. also I stripped out all the whitespace because it's just noise to clj-kondo

dominicm16:10:30

I quite want all the whitespace 😛 I want a whitespace-preserving config/edn rewriter.

borkdude16:10:50

do you use metadata in config a lot?

dominicm16:10:14

Not really, no. But I don't want to break what others are doing.

carocad16:10:00

> fwiw, parcera has the same metadata issue I had with rewrite-clj: @ what issue is that 🤔 ?

dominicm16:10:04

it's about convenience for parsing, consider this example:

(parcera/clojure "{^:x :a,1}")
;; =>
[:code
 [:whitespace ""]
 [:map
  [:map-content
   [:whitespace ""]
   [:metadata [:simple-keyword "x"] [:whitespace " "] [:simple-keyword "a"] [:whitespace ","]]
   [:whitespace ""]
   [:whitespace ""]
   [:number "1"]
   [:whitespace ""]]]
 [:whitespace ""]]
You have to unwrap every node to determine whether it's metadata or not in order to get to the real value.

dominicm16:10:14

The common desire is to access the real value

borkdude16:10:18

@ for example you're analyzing this function: (defn ^String foo [^String x ^Long x] ...). When walking over the nodes, I generally don't want to check if a node is metadata containing some other value and then pull the value I'm actually interested out. I just want to work with those values and if I'm interested in metadata, look at that optionally.

borkdude16:10:25

@ don't get me wrong, what you made is super cool. it's just something I noticed when working with rewrite-clj.

carocad16:10:15

yeah, you are right. However we have conflicting goals. My goal was to guarantee bidirectional snippets and to optionally allow for people to only “see” a part of the complete parsing process. If I dont include those nodes there then I could not make a full roundtrip without losing information. My idea was that such things can be easily done with things like zippers and walk which can automatically remove them without having to actually check for it manually on an implementation

borkdude16:10:22

btw, rewrite-clj has a companion zipper namespace for rewriting. could also be nice for this lib, although you could maybe do it with normal clojure zippers

carocad16:10:02

> don’t get me wrong, what you made is super cool. no worries, all feedback is welcome. I was also thinking of putting some “node removal” functions in parcera but I am not sure if this should be in the core 🤔

borkdude16:10:07

@ true, conflicting goals indeed

borkdude16:10:47

the way I solved it in my fork of rewrite-clj is to lift metadata nodes as real metadata on the value nodes. you could check for that on the way back when writing out.

borkdude16:10:43

Btw, the benchmark of 5.5 was this just "core.clj" or entire Clojure src? I assumed entire Clojure src

carocad17:10:20

nope, it was just core.clj

borkdude17:10:13

Ah I see. I think if you would rewrite this using tools.reader it could be orders of magnitude faster, if that’s a goal

borkdude17:10:50

But then you’re basically doing the same as rewrite-clj

carocad17:10:09

> But then you’re basically doing the same as rewrite-clj yeap, exactly. That is not a goal. rewrite-clj already does that so no need to do the same. That is what I meant before with “I prefer to make instaparse faster than to completely rewrite this just to make it faster”

carocad17:10:57

parcera is still in its infancy … with time we will see which path it takes 😉

alexmiller17:10:33

Btw there was an antlr grammar in Clojure long long ago, before the LispReader existed

carocad18:10:34

oh really ? what happened with it ? I find it quite nice to have a grammar to look at as “source of truth” for a language. From what I saw in the clojure compiler it seems like a “one pass does all” kind of approach …. I guess due to performance ?

dominicm21:10:40

(parcera/code
  (m/rewrite (parcera/clojure "(-> a b c)")
             [?tag . !rest-pre ... . [:list . (m/pred whitespace?) ... . [:symbol "->"] . (m/or (m/pred whitespace?) [:symbol _ :as !args]) ...] . !rest-post ...]
             (m/with [%list [:list !args [:whitespace " "] . %list ...]]
               [?tag . !rest-pre ... . %list ... . !rest-post ...])

             [?tag . (m/cata !content) ...]
             [?tag . !content ...]

             ?x
             ?x))
I wrote (with a lot of help from #meander) a little tool for rewriting (-> a b c) to (a (b (c))) 🙂 Very handy

alexmiller23:10:23

@ was replaced with the LispReader