instaparse

danielsz 2025-07-28T08:23:30.356529Z

Hi all, Not sure if my question pertains to context free-grammars, bnf notation of Instaparse, but I am trying to grok something about the following (simplified) grammar: " = expression+ expression = literal | symbol = number | string number = #'[0-9]+' string = <'\"'> #'[^\"]*' <'\"'> symbol = #'[a-zA-Z0-9-]*[?]?'" Why does running this grammar on "8" result in ([:expression [:symbol "8"]]). The ordering in the expression term has no effect? Are we forced to use a negative lookahead in the symbol term in order to fix this? " = expression+ expression = literal | symbol = number | string number = #'[0-9]+' string = <'\"'> #'[^\"]*' <'\"'> symbol = !number #'[a-zA-Z0-9-]*[?]?'" Running this now on "8" result in ([:expression [:number "8"]]) as I expected earlier. Would love to hear the proper explanation and get educated on this. Thank you!

respatialized 2025-07-28T13:38:39.414459Z

https://github.com/engelberg/instaparse?tab=readme-ov-file#ambiguous-grammars Your grammar is ambiguous, at least in part because the pipe operator doesn't specify precedence. You can use insta/parses to get a list of possible ways of parsing your expression. Fortunately, Instaparse provides an https://github.com/engelberg/instaparse?tab=readme-ov-file#peg-extensions - / - which can help you resolve this ambiguity.

danielsz 2025-07-28T13:40:27.721589Z

Fantastic. I'll dive into it. Thank you!

2025-07-28T15:50:52.433949Z

This kind of thing is why programming languages often restrict the first character of identifiers more than subsequent characters

danielsz 2025-07-28T15:53:52.100629Z

Right, if a symbol cannot start with a number, then we remove the ambiguity, right?

2025-07-28T16:09:58.134729Z

yeah, negative lookahead is a way to do that, another way is to define a symbol-start and a symbol-part and a symbol is a symbol-start followed by 0 or more symbol-part

danielsz 2025-07-28T16:12:54.763879Z

Right. Thanks a lot!