Fork me on GitHub
#instaparse
<
2016-11-04
>
be915:11:11

Hi, I need to parse strings like some text with spaces XXX 12345678 98765 43 222 11. Here are 3 parts: “some text with spaces”, “XXX 12345678”, and "98765 43 222 11”. While the last part is required, the “XXX 12345678” part is optional and will be considered as text by a naive greedy regex. How could I prevent this with Instaparse?

seylerius15:11:47

@be9 Can you describe the requirements your text needs to meet?

seylerius16:11:04

Or give a few more specific examples?

be916:11:11

@seylerius Ok, let’s simplify even more. Two examples: John Doe AGE 50, Dohn Joe. An input string contains a name and might contain this age thing. I want to parse those eventually to {:name “John Doe” :age 50} and {:name “Dohn Joe”}.

be916:11:55

First one should not be {:name “John Doe AGE 50”} 🙂

seylerius16:11:18

Okay. This is a problem I've run into before.

be916:11:32

Names can be long and contain digits too

be916:11:52

John Doe AGE 50 AGE 50 would be preferrably parsed as {:name “John Doe AGE 50” :age 50}

seylerius16:11:07

Basically what you need is to have a name token, token, and then a name+age token. You then parse for this: "name-age / name"

seylerius16:11:30

The slash allows you to express a preference for one over the other.

seylerius16:11:04

Basically, you're saying "if this string can match an age too, do that, otherwise it's just a name"

seylerius16:11:33

I do this a lot in my rebuild of organum, if you want to take a look at the repo.

be916:11:54

oh, the slash. I see, thanks!

seylerius16:11:26

Yep. The slash is for preferential parsing.

be916:11:52

👍 @seylerius, I guess that’s it 🙂