Fork me on GitHub
#instaparse
<
2021-06-29
>
borkdude08:06:48

Hey, someone here? :)

borkdude08:06:09

I was trying to make this ebnf grammar work with instaparse: https://github.com/cbeust/kash/blob/master/src/main/resources/bash.ebnf But so far it didn't work out

borkdude08:06:31

Here's what I got: https://gist.github.com/borkdude/98c5d9e2bf598b227e8e643e4271e61e

user=> (def parser (insta/parser "/Users/borkdude/Downloads/bash.ebnf"))
#'user/parser
user=> (parser "foo")
Parse error at line 1, column 1:
foo
^
Expected:
#"[0-9]"

Sigve09:06:28

Hi, when you do not specify a starting rule for the grammar instaparse selects the top rule for a starting point. In you case that is the number rule. https://github.com/engelberg/instaparse#parsing-from-another-start-rule This should work:

(parser "foo" :start :command)

Sigve09:06:35

(NB: surrounding an rule with angle brackets makes it hidden, since all commands are hidden you will probably only get an empty list on a successful parse)

borkdude10:06:19

user=> (parser "foo" :start :word)
("f" "o" "o")

borkdude10:06:22

btw, it wasn't my choice to use angle brackets, I just copied that from the original ebnf

borkdude10:06:06

oh I see, hidden means you don't get it back in the structure, but directly?

borkdude10:06:27

why does this succeed if I have set :partial to false:

user=> (parser "foo" :start :word :partial false)
[:word]

Sigve10:06:36

That was my hunch, which is why i thought a head's up was in it's place:) Yes, these is at good example of hiding here: https://github.com/engelberg/instaparse#hiding-content as mentioned, it is usually used for hiding whitespace and other tokens you do not care about in the final output, but if you hide the top rule, everything disapears

borkdude10:06:55

ah I see, it was because of the hiding again:

user=> (parser "foo" :start :word :partial false)
[:word [:word [:word [:letter "f"]] [:letter "o"]] [:letter "o"]]

Sigve10:06:00

:partialallows a partially complete/successful parse to succeed, embedding the failure node in the AST where at the point where the output

borkdude10:06:17

It seems the original ebnf works a bit differently than instaparse. e.g.:

<for_command> ::=  'for' <word> <newline_list> 'do' <compound_list> 'done'
            |  'for' <word> <newline_list> '{' <compound_list> '}'
            |  'for' <word> ';' <newline_list> 'do' <compound_list> 'done'
            |  'for' <word> ';' <newline_list> '{' <compound_list> '}'
            |  'for' <word> <newline_list> 'in' <word_list> <list_terminator>
                   <newline_list> 'do' <compound_list> 'done'
            |  'for' <word> <newline_list> 'in' <word_list> <list_terminator>
                   <newline_list> '{' <compound_list> '}'

borkdude10:06:33

seems to assume that the tokens are automatically separated by whitespace

borkdude10:06:42

if I have to rewrite the grammar anyway I'm more inclined to hand-roll my own parser

Sigve10:06:01

I think that for most yacc/bison parsers rules are separated by whitespace by default yes, instaparse supports adding this by using the auto-whitespace feature which has worked well for me https://github.com/Engelberg/instaparse/blob/master/docs/ExperimentalFeatures.md#auto-whitespace

Sigve10:06:16

I dont know what you are using this parser for, but in my experience using a proper grammar-based parser is more maintainable and flexible in the long run. Of course for small use cases it can be a lot to get into and learn

borkdude10:06:34

this parser should parse bash syntax

borkdude10:06:42

but bash is not such a big language

borkdude10:06:15

I just have some problems getting this to work with instaparse so far

borkdude10:06:47

it's not very important, just a fun project

Sigve10:06:31

Then i guess comes down to which approach you find most fun:) I think instaparse is quite amazing once you grok it, but again i understand i can be a hassle go get into. On the other side, hand written parsers can also be painful to get correct

aengelberg19:06:09

There isn’t really a single EBNF syntax specification or RFC, so every “EBNF grammar” you’ll find in the wild will have a slightly varied flavor of the syntax. Sometimes because a certain parser library chose a unique metasyntax, or sometimes because the grammar is meant to serve as documentation rather than compiled and executed.

aengelberg19:06:38

Instaparse attempts to support most of the different flavors, which is why you can use either x? or [x] syntax for example

aengelberg19:06:03

But sometimes a grammar or a different parser library will make a particularly unusual syntax choice, like using angle brackets in rule names

aengelberg19:06:52

Or a grammar will make an implicit logical assumption that Instaparse has no way to act upon, like whitespace being parsed between tokens

aengelberg19:06:52

The angle brackets are particularly unfortunate since Instaparse chose to use angle brackets for an instaparse-specific feature (hiding data from the output parse tree)

aengelberg20:06:36

ABNF, on the other hand, seems to be a much more regulated metasyntax, so copy and pasting ABNF grammars into instaparse (using :input-format :abnf) tends to be safer