Fork me on GitHub

Hey, someone here? :)


I was trying to make this ebnf grammar work with instaparse: But so far it didn't work out


Here's what I got:

user=> (def parser (insta/parser "/Users/borkdude/Downloads/bash.ebnf"))
user=> (parser "foo")
Parse error at line 1, column 1:


Hi, when you do not specify a starting rule for the grammar instaparse selects the top rule for a starting point. In you case that is the number rule. This should work:

(parser "foo" :start :command)


(NB: surrounding an rule with angle brackets makes it hidden, since all commands are hidden you will probably only get an empty list on a successful parse)


user=> (parser "foo" :start :word)
("f" "o" "o")


btw, it wasn't my choice to use angle brackets, I just copied that from the original ebnf


oh I see, hidden means you don't get it back in the structure, but directly?


why does this succeed if I have set :partial to false:

user=> (parser "foo" :start :word :partial false)


That was my hunch, which is why i thought a head's up was in it's place:) Yes, these is at good example of hiding here: as mentioned, it is usually used for hiding whitespace and other tokens you do not care about in the final output, but if you hide the top rule, everything disapears


ah I see, it was because of the hiding again:

user=> (parser "foo" :start :word :partial false)
[:word [:word [:word [:letter "f"]] [:letter "o"]] [:letter "o"]]


:partialallows a partially complete/successful parse to succeed, embedding the failure node in the AST where at the point where the output


It seems the original ebnf works a bit differently than instaparse. e.g.:

<for_command> ::=  'for' <word> <newline_list> 'do' <compound_list> 'done'
            |  'for' <word> <newline_list> '{' <compound_list> '}'
            |  'for' <word> ';' <newline_list> 'do' <compound_list> 'done'
            |  'for' <word> ';' <newline_list> '{' <compound_list> '}'
            |  'for' <word> <newline_list> 'in' <word_list> <list_terminator>
                   <newline_list> 'do' <compound_list> 'done'
            |  'for' <word> <newline_list> 'in' <word_list> <list_terminator>
                   <newline_list> '{' <compound_list> '}'


seems to assume that the tokens are automatically separated by whitespace


if I have to rewrite the grammar anyway I'm more inclined to hand-roll my own parser


I think that for most yacc/bison parsers rules are separated by whitespace by default yes, instaparse supports adding this by using the auto-whitespace feature which has worked well for me


I dont know what you are using this parser for, but in my experience using a proper grammar-based parser is more maintainable and flexible in the long run. Of course for small use cases it can be a lot to get into and learn


this parser should parse bash syntax


but bash is not such a big language


I just have some problems getting this to work with instaparse so far


it's not very important, just a fun project


Then i guess comes down to which approach you find most fun:) I think instaparse is quite amazing once you grok it, but again i understand i can be a hassle go get into. On the other side, hand written parsers can also be painful to get correct


There isn’t really a single EBNF syntax specification or RFC, so every “EBNF grammar” you’ll find in the wild will have a slightly varied flavor of the syntax. Sometimes because a certain parser library chose a unique metasyntax, or sometimes because the grammar is meant to serve as documentation rather than compiled and executed.


Instaparse attempts to support most of the different flavors, which is why you can use either x? or [x] syntax for example


But sometimes a grammar or a different parser library will make a particularly unusual syntax choice, like using angle brackets in rule names


Or a grammar will make an implicit logical assumption that Instaparse has no way to act upon, like whitespace being parsed between tokens


The angle brackets are particularly unfortunate since Instaparse chose to use angle brackets for an instaparse-specific feature (hiding data from the output parse tree)


ABNF, on the other hand, seems to be a much more regulated metasyntax, so copy and pasting ABNF grammars into instaparse (using :input-format :abnf) tends to be safer