Fork me on GitHub
#instaparse
<
2018-08-02
>
mlimotte18:08:30

Can I get some help with a (hopefully) a simple grammar? I haven’t done much with CFGs so I could be totally off base. I want to find variable expressions in a string. For example: hello, {{name}}. This is similar to the Mustache variety of interpolation, but I need to pre-parse it to do something slightly different. I can recognize the pattern above pretty easily. My problem is having it ignore single brackets. For example: hello, {{name}}. Please choose {Yes, No}. The last part is not a double bracket expression and should just be treated like the other uninteresting text. So, my grammar looks like this (I’ve tried a bunch of other variations, this is the closest I’ve come):

(def p 
  (insta/parser
    "<S> = (block | TXT)*
     block = <'{{'> TXT <'}}'>
     <TXT> = (OPEN | CLOSE | A | block)*
     <OPEN> = !'{' '{'
     <CLOSE> = !'}' '}'
     <A> = #'[^{}]*'"))

mlimotte18:08:03

A call (p "x{a}") yields:

=> Parse error at line 1, column 2:
x{a}
 ^
Expected one of:
"{{"
"}"
NOT "{"

mlimotte18:08:54

Seems like the x got picked up by <A>. I would have liked it to match !‘{’, so that the next char could match in <OPEN>

aengelberg18:08:07

try changing the OPEN and CLOSE rules to

<OPEN> = '{' !'{'
<CLOSE> = '}' !'}'

mlimotte18:08:38

That seems to work.

aengelberg18:08:10

The problem in the original grammar was that the negative lookahead was conflicting with the token itself. It was basically saying "If there isn't an open bracket, please parse an open bracket"

aengelberg18:08:26

Whereas what you really want is "Please parse an open bracket but only if there isn't another open bracket right after"

mlimotte18:08:53

hmm.. ok, i think that makes sense to me.

mlimotte18:08:06

Very cool. Thanks for the quick help!

mlimotte18:08:58

Here’s an edge case that still fails. But it’s a bit contrived, so if it’s not a trivial fix, I don’t need to worry about it. (p "{{y}")

aengelberg18:08:11

do you want that to parse as normal text?

mlimotte18:08:43

not a block

aengelberg18:08:38

maybe something like

<S> = TXT*
block = <'{{'> TXT <'}}'>
<TXT> = (block / A)*
<A> = #'[^{}]*' | '{' | '}'

aengelberg18:08:27

here I'm changing the A rule to match any text (including brackets and double brackets) but then using the ordered choice (`/`) to prefer parsing complete blocks when possible.

mlimotte18:08:41

oh.. that’s great. I had tried an approach like that previously, but didn’t know how to prefer one parse over another … that / operator is new to me.

mlimotte18:08:28

thanks for your help, again