Fork me on GitHub

Is there a channel where I can get recommendations for a library? I'm looking for a faster (like, WAY faster) Instaparse or similar that needs to work on ClojureScript...


there's #find-my-lib

Cora (she/her)03:05:32

you'll never get faster than a hand-rolled parser

Cora (she/her)03:05:20

at least when compared with a generalized one

Ben Sless05:05:46

*unless you compile it with a staging interpreter

Ben Sless05:05:21

You could try to use malli's regex combinators for your own parser


IIRC for my cases, ANTLR4 was faster.


An addition to that - you have to use version 4.8-1 with CLJS due to

delaguardo08:05:20 This could work, but i don't have time to prepare proper documentation so you have to look at source code and examples. Maybe next week i will have some time to prepare some


@U2FRKM4TW thanks for the suggestion, but unfortunately, in my case (parsing a prolog term) antlr4 is actually slower - by some orders of magnitude (using the grammar they have available at their github)! I was able make it slightly faster by using some tips from the Instaparse site, but I'll probably need to hand-roll a parser (or change my strategy 100% to not have to parse, which I honestly do not want to do considering what I'm doing right now 😄)


@U02N27RK69K thanks for the tip, I made a parser and indeed it is WAY faster!

🎉 1
Ben Sless04:05:56

@U3Y18N0UC any chance you can share the bnf you're using? I'm curious to try translating it to different parsers


@UK0810AQ2 honestly, it's just a case checking for different first chars and some regexp: Nothing really special, and I just need to parse a subset of Prolog so it probably is sufficient :)


It's possible that I'll add some memoize in the future, but for now it works really well and it's fast enough for my cases.

Ben Sless06:05:10

Rough sketch:

(defn char-range
  (let [from (long (first from+to))
        to (long (second from+to))]
     {:type ::char-range
      :pred (fn [x] (<= from (long x) to))})))

(defn char-seq
  (into [:cat] (mapv (fn [c] [:= c]) cs)))

    {::digit (char-range "09")
     ::lower-case (char-range "az")
     ::upper-case (char-range "AZ")
     ::letter [:alt ::lower-case ::upper-case]
     ::number [:+ ::digit]
     ::var [:cat ::upper-case [:* ::lower-case]]
     ::boolean [:altn [:true (char-seq "true")] [:false (char-seq "false")]]
     ::atom [:cat ::lower-case [:* ::letter]]
     ::term [:altn
             #_[:equality ::equality]
             [:atom ::atom]
             [:var ::var]
             [:number ::number]
             [:boolean ::boolean]
             #_[:structure ::structure]
             #_[:list ::list]
             #_[:string ::string]]
     ;; ::equality [:cat
     ;;             ::var
     ;;             [:= \space]
     ;;             [:= \=]
     ;;             [:= \space]
     ;;             ::term]
 (seq "Xyz"))


@UK0810AQ2 do you have an idea of how performant it is?

Ben Sless14:05:28

Not yet, hit a snag with malli so no benchmarks, yet

Ben Sless15:05:49

I can benchmark some rudimentary parsing if you're interested. What should be my baseline?


Ok, this was one code that Instaparse's version was struggling to parse: on my machine, it was taking almos half a second, then I optimized my code and it was parsing in 160ms (better, but FAR from what I wanted).

Ben Sless19:05:06

I can write the schema for it but until I figure out what's up with malli I can't run a full example Parsing something like (p (seq "foo(X, Y)")) where p is a defined parser takes about 3 micro seconds

Ben Sless05:05:53

I'd like to share this experiment I've been poking at for the past year, which I can finally promote from floundering to promising. "Let's write a compiler, how hard can it be?"

👍 6

Thanks for sharing. Btw. the link in the readme is incorrect (relative URL?) - it should be

Ben Sless11:05:40

Good find! I'll fix it right up fixed