Fork me on GitHub
#instaparse
<
2020-10-02
>
jeremys13:10:39

@hiredman Hi, I don't know what to make of your answer. I could also say that even with a grammar that offer alternatives I could still make the choice not to use a parsing library and write a reader by hand. I am using instaparse because it makes it much easier to build a parser with it than without it. It also makes it easier to evolve the grammar. Maybe the answer to my question is that my grammar can be expressed another way instead of wanting to tell the parser not to backtrack. Still if the functionality existed I'd like to know about it! Cheers.

manutter5113:10:10

What was your reason for wanting to error out instead of backtracking? Are you trying to raise an alert about invalid expressions in what you’re parsing, or are you trying to debug your grammar?

jeremys16:10:50

@manutter51 hi! I want to raise an alert about invalid code. I am working on something like https://docs.racket-lang.org/pollen/ that allows to write code in the middle of text. My problem arises in particular situations. For instance, the entry rule of my grammar looks like this:

doc = (plain-text | embedded)*
If I write a pollen expression like this:
plain-text ◊str["some string"] plain-text
it is parsed as:
[:doc 
 "plain-text " 
 [:tag 
  [:tag-name "str"] 
  [:tag-clj-arg "[" " " "\"aaa\"" "]"]] 
 " plain-text"]
Now if i make a mistake balancing the quotes:
plain-text ◊str["some string""] plain-text
the way my grammar works I get:
[:doc 
 "plain-text " 
 [:tag 
  [:tag-name "str"]] 
 "[ \"some string\"\"]  plain-text"]
From the point of view of the parser there is no error here. The ["some string""] expression, which serves as arguments to the str function, couldn't be parsed as correct clojure code. However the parser can fall back to the plain-text grammatical rule and did just that. In this case I'd rather it didn't.

manutter5116:10:43

Perhaps you could define plain-text so that it’s not allowed to contain an unescaped character?

jeremys16:10:53

It is actually 🙂 That's how the grammar recognizes that there is a "tag-fn" there (in pollen's jargon) or embedded code in general. And so we rightly get the [:tag [:tag-name "str"]] part. What happens is that the arguments to the function are optional. Thus if the text that follows the function's name is malformed args, the parser can fall back to plain text. It may be be that the parser can't be made to throw in that case or that I can't gerrymander my grammar into doing what I want. It would would be cool if I could though.