Fork me on GitHub
#instaparse
<
2018-01-08
>
mbjarland08:01:43

I have a question about greedy parsing and ambiguous grammars, I have the following grammar:

(insta/parser
    "layout   = (align | delim | repeat)+
     repeat   = <'{'> (align | delim)+ <'}'>
     delim    = (fill | padding)+
     fill     = 'F'
     padding  = #'[^\\[\\]{}fF]*'
     align    = <'['> ('L' | 'C' | 'R' | 'V') <']'>"
    :string-ci true))
where the only relevant pieces are the layout = (align | delim | repeat)+ and delim = (fill | padding)+ pieces. Assume we get a ‘fill’ followed by a ‘padding’, the parser can here choose between [:delim [:fill "F"]] [:delim [:padding "xxx"]] and [:delim [:fill "F"] [:padding "xxx"]]. This is not really instaparse specific, but rather to do with bnf’s and disambiguating repeating elements. I would like to force the second interpretation where the delim = (fill | padding)+ is greedy and collects all contiguous delim elements into a list before proceeding. Any ideas much appreciated

aengelberg09:01:27

@mbjarland you could change (align | delim)+ to delim? (align delim?)*, that would force the parser to alternate between delim and align, effectively making the delim rule greedy

mbjarland09:01:25

and this is why I love the clojure community, thank you for the fast reply! : ) are we talking about the first rule layout = ...? In that case I would have to cook up a repeating pattern with 3 elements

mbjarland09:01:09

so what I’m doing here is defining a column layout

aengelberg09:01:28

Actually I was talking about the repeat rule. But you made a good point that we'd also have to apply a similar solution to the layout rule to get a similar effect

mbjarland09:01:36

where the repeat says “if the user comes in with more columns than we have defined, use the repeating group to fill out the layout”

aengelberg09:01:09

Column layout? Not sure what you mean

mbjarland09:01:27

never mind, that is really application related and not related to the grammar

mbjarland09:01:07

thought actually explaining what the thing does might clarify, but I think it just muddles the waters even more : )

mbjarland09:01:41

I also thought about the / ordered parsing syntax, but I can not really see how to apply that to these repeating patterns

aengelberg09:01:47

layout = delim? ((align | repeat) delim?)*
repeat = <'{'> delim? (align delim?)* <'}'>

aengelberg09:01:11

Since align and repeat both require nonzero characters to be present in order for the rule to succeed, wedging them in the repeat rule enforces some regularity in the repetition and thus unambiguates the parsing

mbjarland09:01:59

that works, I tried it with insta/parses and my standard examples and it comes out with a single interpretation

mbjarland09:01:10

I will have to meditate on the exact mechanics

aengelberg09:01:28

Yeah, ordered choice doesn't really help here unless delim was competing with some other rule and you wanted to establish the priority. But in this case delim is just competing with itself, in a way.

mbjarland09:01:58

exactly, just a question on what level in the bnf the repetition happens

aengelberg09:01:27

Fortunately that's easily solved by restructuring the combinators, no fancy PEG combinators necessary

mbjarland09:01:33

this is actually a higher level pattern, I will make sure to grok this properly for the next time I run into an ambiguous repetition

mbjarland09:01:50

thanks a ton!

aengelberg09:01:08

@mbjarland I just thought of another way you could have solved it: ignore the above changes I proposed and instead change delim rule to:

delim = (fill | padding)+ !delim

aengelberg09:01:41

Pretty sure my first solution would be more performant, albeit more complex

mbjarland11:01:02

what does the !delim do?