Fork me on GitHub
#off-topic
<
2022-05-23
>
mauricio.szabo01:05:18

Is there a channel where I can get recommendations for a library? I'm looking for a faster (like, WAY faster) Instaparse or similar that needs to work on ClojureScript...

phronmophobic01:05:46

there's #find-my-lib

Cora (she/her)03:05:32

you'll never get faster than a hand-rolled parser

Cora (she/her)03:05:20

at least when compared with a generalized one

Ben Sless05:05:46

*unless you compile it with a staging interpreter

Ben Sless05:05:21

You could try to use malli's regex combinators for your own parser

p-himik06:05:28

IIRC for my cases, ANTLR4 was faster.

p-himik06:05:24

An addition to that - you have to use version 4.8-1 with CLJS due to https://github.com/google/closure-compiler/issues/3477.

delaguardo08:05:20

https://github.com/xapix-io/mutagen/blob/master/src/mutagen/grammars/json.cljc This could work, but i don't have time to prepare proper documentation so you have to look at source code and examples. Maybe next week i will have some time to prepare some

mauricio.szabo15:05:25

@U2FRKM4TW thanks for the suggestion, but unfortunately, in my case (parsing a prolog term) antlr4 is actually slower - by some orders of magnitude (using the grammar they have available at their github)! I was able make it slightly faster by using some tips from the Instaparse site, but I'll probably need to hand-roll a parser (or change my strategy 100% to not have to parse, which I honestly do not want to do considering what I'm doing right now 😄)

mauricio.szabo04:05:50

@U02N27RK69K thanks for the tip, I made a parser and indeed it is WAY faster!

🎉 1
Ben Sless04:05:56

@U3Y18N0UC any chance you can share the bnf you're using? I'm curious to try translating it to different parsers

mauricio.szabo05:05:04

@UK0810AQ2 honestly, it's just a case checking for different first chars and some regexp: https://gitlab.com/mauricioszabo/spock/-/merge_requests/5/diffs. Nothing really special, and I just need to parse a subset of Prolog so it probably is sufficient :)

mauricio.szabo05:05:36

It's possible that I'll add some memoize in the future, but for now it works really well and it's fast enough for my cases.

Ben Sless06:05:10

Rough sketch:

(defn char-range
  [from+to]
  (let [from (long (first from+to))
        to (long (second from+to))]
    (m/-simple-schema
     {:type ::char-range
      :pred (fn [x] (<= from (long x) to))})))

(defn char-seq
  [cs]
  (into [:cat] (mapv (fn [c] [:= c]) cs)))

(m/parse
 (m/schema
  [:schema
   {:registry
    {::digit (char-range "09")
     ::lower-case (char-range "az")
     ::upper-case (char-range "AZ")
     ::letter [:alt ::lower-case ::upper-case]
     ::number [:+ ::digit]
     ::var [:cat ::upper-case [:* ::lower-case]]
     ::boolean [:altn [:true (char-seq "true")] [:false (char-seq "false")]]
     ::atom [:cat ::lower-case [:* ::letter]]
     ::term [:altn
             #_[:equality ::equality]
             [:atom ::atom]
             [:var ::var]
             [:number ::number]
             [:boolean ::boolean]
             #_[:structure ::structure]
             #_[:list ::list]
             #_[:string ::string]]
     ;; ::equality [:cat
     ;;             ::var
     ;;             [:= \space]
     ;;             [:= \=]
     ;;             [:= \space]
     ;;             ::term]
     }}
   ::term])
 (seq "Xyz"))

mauricio.szabo13:05:56

@UK0810AQ2 do you have an idea of how performant it is?

Ben Sless14:05:28

Not yet, hit a snag with malli so no benchmarks, yet

Ben Sless15:05:49

I can benchmark some rudimentary parsing if you're interested. What should be my baseline?

mauricio.szabo19:05:51

Ok, this was one code that Instaparse's version was struggling to parse: on my machine, it was taking almos half a second, then I optimized my code and it was parsing in 160ms (better, but FAR from what I wanted).

Ben Sless19:05:06

I can write the schema for it but until I figure out what's up with malli I can't run a full example Parsing something like (p (seq "foo(X, Y)")) where p is a defined parser takes about 3 micro seconds

Ben Sless05:05:53

I'd like to share this experiment I've been poking at for the past year, which I can finally promote from floundering to promising. "Let's write a compiler, how hard can it be?" https://github.com/bsless/clj-analyzer

👍 6
jumar10:05:49

Thanks for sharing. Btw. the link in the readme is incorrect (relative URL?) - it should be https://gitlab.haskell.org/ghc/ghc/-/wikis/commentary/compiler/core-to-core-pipeline

Ben Sless11:05:40

Good find! I'll fix it right up fixed