Fork me on GitHub
#instaparse
<
2016-02-08
>
wongiseng20:02:02

Hi, very basic question probably not specific to instaparse. From this basic example : "S = N | (N ('+' N)+); N = '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9';" if I want to enforce that not all N are 0s, should this be done in the grammar definition or by adding some logic on processing the parsed result ? I suspect the latter, but just in case anyone knows other ways to enforce this restriction directly in the grammar, I'd like to know. TIA

aengelberg21:02:12

Hi @wongiseng, I saw your question on gitter as well. Instaparse's job is to turn strings into meaningful data; any validation you want to do on that data probably should happen after the parse.

aengelberg21:02:11

The only real way to have more sophisticated validation on an input is to use lookahead and negative lookahead.

aengelberg21:02:25

Well, those are the only ways to do sophisticated validation within instaparse.

aengelberg21:02:17

In this particular example you could use negative lookahead, e.g. S = !('0'*) (N | (N ('+' N)+));

socksy21:02:20

this works, but it's ambiguous:

(def minimum-one-not-zero
  (insta/parser
    "EXP = N | S;
    S = (ZN '+')* N ('+' ZN)*;
    N =  '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9';
    ZN = '0' | N;"))

aengelberg21:02:35

^ That would work as well

aengelberg21:02:02

The advantage to writing your own validation after the parse is that when the input is wrong, you can write your own error message to say whatever you want instead of instaparse's failure message which might not be as readable.

socksy21:02:34

^definitely

Parse error at line 1, column 4:
0+0
   ^
Expected:
"+"

aengelberg21:02:57

oops, my negative lookahead approach definitely wouldn't work because I totally didn't see the pluses in the input

aengelberg21:02:07

Maybe

S = &(#".*[1-9]") (N | (N ('+' N)+);

aengelberg21:02:25

e.g. "make sure there's some nonzero number somewhere, then parse as usual"

aengelberg21:02:09

that's lookahead not negative lookahead

socksy21:02:53

if errors aren't important, and the fact you might get the "wrong" evaluation (e.g. "1+0+1" could be [:EXP [:S [:N "1"] "+" [:ZN "0"] "+" [:ZN [:N "1"]]]] or [:EXP [:S [:ZN [:N"1"]] "+" [:ZN 0] "+" [:N 1]]]) is also unimportant (e.g. you eval N and ZN the same), then you should be fine with the ambiguous grammar

socksy21:02:32

(instaparse gives you the former)

aengelberg21:02:04

@socksy how about

S = N ('+' N)* | (N '+')* '0' ('+' ZN)*;

aengelberg21:02:27

I'm just writing these off the top of my head, not evaluating them to be sure. I think that would be unambiguous though

aengelberg21:02:43

hmm, that's definitely wrong simple_smile

aengelberg21:02:54

not sure where that came from

aengelberg21:02:59

Using lookahead would likely be the easiest path, since the grammar would be unambiguous and easy to understand

wongiseng21:02:54

Cool, thanks for the explanations, I'll play a bit with look ahead, but eventually I guess i'll validate after the parse

wongiseng21:02:40

The negative lookaheads makes the grammar hard to digest for me

wongiseng23:02:16

My actual problem was OR to have at least one positive term