Fork me on GitHub

I have a question about greedy parsing and ambiguous grammars, I have the following grammar:

    "layout   = (align | delim | repeat)+
     repeat   = <'{'> (align | delim)+ <'}'>
     delim    = (fill | padding)+
     fill     = 'F'
     padding  = #'[^\\[\\]{}fF]*'
     align    = <'['> ('L' | 'C' | 'R' | 'V') <']'>"
    :string-ci true))
where the only relevant pieces are the layout = (align | delim | repeat)+ and delim = (fill | padding)+ pieces. Assume we get a ‘fill’ followed by a ‘padding’, the parser can here choose between [:delim [:fill "F"]] [:delim [:padding "xxx"]] and [:delim [:fill "F"] [:padding "xxx"]]. This is not really instaparse specific, but rather to do with bnf’s and disambiguating repeating elements. I would like to force the second interpretation where the delim = (fill | padding)+ is greedy and collects all contiguous delim elements into a list before proceeding. Any ideas much appreciated


@mbjarland you could change (align | delim)+ to delim? (align delim?)*, that would force the parser to alternate between delim and align, effectively making the delim rule greedy


and this is why I love the clojure community, thank you for the fast reply! : ) are we talking about the first rule layout = ...? In that case I would have to cook up a repeating pattern with 3 elements


so what I’m doing here is defining a column layout


Actually I was talking about the repeat rule. But you made a good point that we'd also have to apply a similar solution to the layout rule to get a similar effect


where the repeat says “if the user comes in with more columns than we have defined, use the repeating group to fill out the layout”


Column layout? Not sure what you mean


never mind, that is really application related and not related to the grammar


thought actually explaining what the thing does might clarify, but I think it just muddles the waters even more : )


I also thought about the / ordered parsing syntax, but I can not really see how to apply that to these repeating patterns


layout = delim? ((align | repeat) delim?)*
repeat = <'{'> delim? (align delim?)* <'}'>


Since align and repeat both require nonzero characters to be present in order for the rule to succeed, wedging them in the repeat rule enforces some regularity in the repetition and thus unambiguates the parsing


that works, I tried it with insta/parses and my standard examples and it comes out with a single interpretation


I will have to meditate on the exact mechanics


Yeah, ordered choice doesn't really help here unless delim was competing with some other rule and you wanted to establish the priority. But in this case delim is just competing with itself, in a way.


exactly, just a question on what level in the bnf the repetition happens


Fortunately that's easily solved by restructuring the combinators, no fancy PEG combinators necessary


this is actually a higher level pattern, I will make sure to grok this properly for the next time I run into an ambiguous repetition


thanks a ton!


@mbjarland I just thought of another way you could have solved it: ignore the above changes I proposed and instead change delim rule to:

delim = (fill | padding)+ !delim


Pretty sure my first solution would be more performant, albeit more complex


what does the !delim do?