instaparse

Giles Alexander 2023-11-09T18:25:05.881699Z

Hi, I’m having a weird issue with instaparse. I’ve written a grammar using EBNF. When parsing moderately long (~200 lines) documents using that grammar, if the document has an error then Instaparse will go into an infinite loop, but only if the document uses DOS line endings. Huh? If the document uses UNIX line endings, then Instaparse reports the error. EOL is part of the syntax of the grammar, and I have defined a terminal ('\r\n' | '\n' | '\r' | #"$"). There’s got to be something wrong with my grammar, and I want to fix that. But, it seems odd that I’m able to drop Instaparse into an infinite loop with only changing the line endings. Anyone have any ideas where I should start to look to produce a simpler test case? Thanks 🙏

aengelberg 2023-11-09T18:27:43.919459Z

So it infinite loops if the input has \r\n ?

Giles Alexander 2023-11-09T18:28:36.516899Z

Not exactly. Infinite loops if the document has \r\n and the document otherwise has a parse error.

👍 1
aengelberg 2023-11-09T18:30:25.224239Z

I think #"$" might be dicey because it will detect the end of a line, not consume it (because it parses zero characters), meaning you could parse infinite empty lines in a row

Giles Alexander 2023-11-09T18:30:58.153509Z

Ahhh… I’m trying to match end of input as the same as an end of line

aengelberg 2023-11-09T18:31:04.053699Z

Also, $ detects the end of a line, not the end of the file

Giles Alexander 2023-11-09T18:31:47.242449Z

\z instead?

aengelberg 2023-11-09T18:32:16.785719Z

Yeah, that seems closer to what you want

Giles Alexander 2023-11-09T18:32:40.508709Z

Thanks! I’ll give it a try. And see if I can produce something to repro the infinite loop with that and without that change

👍 1
aengelberg 2023-11-09T18:33:07.531729Z

I'd still be a little concerned at the potential for matching infinite EOF's

Giles Alexander 2023-11-09T18:34:17.441709Z

OK. I see what you mean. I’ll have a think about a different way of expressing this

aengelberg 2023-11-09T18:35:05.102679Z

It's possible you don't need to explicitly match on EOF, because instaparse will only consider a parse valid if it consumes the whole string

👍 1