Hi all,
Not sure if my question pertains to context free-grammars, bnf notation of Instaparse, but I am trying to grok something about the following (simplified) grammar:
"
expression = literal | symbol
number = #'[0-9]+'
string = <'\"'> #'[^\"]*' <'\"'>
symbol = #'[a-zA-Z0-9-]*[?]?'"
Why does running this grammar on "8" result in ([:expression [:symbol "8"]]). The ordering in the expression term has no effect? Are we forced to use a negative lookahead in the symbol term in order to fix this?
"
expression = literal | symbol
number = #'[0-9]+'
string = <'\"'> #'[^\"]*' <'\"'>
symbol = !number #'[a-zA-Z0-9-]*[?]?'"
Running this now on "8" result in ([:expression [:number "8"]]) as I expected earlier.
Would love to hear the proper explanation and get educated on this. Thank you!
https://github.com/engelberg/instaparse?tab=readme-ov-file#ambiguous-grammars
Your grammar is ambiguous, at least in part because the pipe operator doesn't specify precedence. You can use insta/parses to get a list of possible ways of parsing your expression.
Fortunately, Instaparse provides an https://github.com/engelberg/instaparse?tab=readme-ov-file#peg-extensions - / - which can help you resolve this ambiguity.
Fantastic. I'll dive into it. Thank you!
This kind of thing is why programming languages often restrict the first character of identifiers more than subsequent characters
Right, if a symbol cannot start with a number, then we remove the ambiguity, right?
yeah, negative lookahead is a way to do that, another way is to define a symbol-start and a symbol-part and a symbol is a symbol-start followed by 0 or more symbol-part
Right. Thanks a lot!