Fork me on GitHub
#instaparse
<
2017-02-17
>
frank17:02:26

I'm having trouble creating a parser using the grammar specified here: https://developers.google.com/protocol-buffers/docs/reference/proto3-spec

frank17:02:52

I'm getting the feeling that there are syntax differences

frank17:02:26

I'm slurping the grammar out of a separate file, but I feel like escaped quotes still aren't being handled as I intend (e.g. quote = "'" | '"')

frank18:02:49

does anyone know how quotes ought to be escaped in instaparse ebnf strings?

gfredericks18:02:21

the way you have it looks likely to work to me

frank18:02:49

maybe there's unmatched quotes somewhere in the grammar that I copied and pasted 😕

gfredericks18:02:38

try making a trivial grammar that only matches a quote to make sure it works the way you expect

seylerius18:02:58

^ This. So much this. When I'm making grammars, I often make little phrases to match a character I haven't tested before.

frank18:02:44

I'll try that, thanks

aengelberg18:02:20

"'" | '"' looks right, but there are sometimes additional layers of escaping you have to deal with.

aengelberg18:02:53

e.g. if you wrote your grammar as a string in a Clojure file, it would probably have to look like

(def parser (insta/parser "quote \"'\" | '\"'"))

aengelberg18:02:11

I see this in the protobuf spec

hexEscape = '\'
that will probably throw off instaparse, since it thinks you are escaping the second '

aengelberg18:02:25

so it should really be

hexEscape = '\\'

aengelberg18:02:20

also, /[^\0\n\\]/ is not valid EBNF in instaparse (should be #"[^\0\n\\]")

frank18:02:38

ah, that's probably it!

frank18:02:41

strangely enough, #"[^\0\n\\]" isn't valid clojure regex syntax, so I stole the same regex syntax from https://github.com/arpagaus/clj-protobuf/blob/master/resources/proto.ebnf

frank18:02:04

they've got a few extra backslashes: #"[^\\0\\n]"

frank18:02:25

@aengelberg what's the equivalent of the … that they've got littered all over their grammar?

aengelberg18:02:37

I think they meant that as a shorthand for alternating between all the digits. Sadly instaparse can't infer the intermediate values, so you would have to "0" | "1" | "2" | "3" | "4" | "5" | "6" | "8" | "9"

frank18:02:15

ah, gotcha

frank18:02:57

alternatively, #"[0-9]" should work too, right?