Fork me on GitHub
#instaparse
<
2017-12-14
>
mbjarland16:12:28

I'm playing around with instaparse and for kicks and giggles I wrote a parser to parse some log files I have laying around

mbjarland16:12:50

is there a way to define a fixed width "anything goes" string in instaparse

mbjarland16:12:33

i.e. if I just want to gobble up a few characters into a tree node and don't care about the content there, is that possible?

aengelberg16:12:36

Fixed width? Maybe #'.{N}'?

mbjarland16:12:09

right, yes regex does the job but is probably not very performant for just "take substring of 10 from where you are"

mbjarland16:12:30

ok, so regex is the way to go for this in instaparse?

aengelberg16:12:01

I think regex is the most performant way to grab a not-static set of characters

mbjarland16:12:08

: ) well I should probably mention that I think instaparse is excellent and by far the best parser lib I've run across....so my intent was not to come here and critique it

aengelberg16:12:00

Thanks! And no worries, I was just answering your question from the perspective of what instaparse actually supports

mbjarland16:12:23

that being said...if I parse 2G of log files (without instaparse) and compare the simplest regex match with (subs line 10 20), regex performace doesn't exactly shine

aengelberg16:12:32

But I see your point that if it theoretically supported a dedicated "substring" combinator, that would be faster

mbjarland16:12:59

anyway, figured I would ask, but regex does indeed do the job and perhaps what I'm doing with this parser is a bit of an edge case

aengelberg16:12:25

Maybe we should support "custom combinators" so people like you with special use cases can write their own more performant specialized versions

mbjarland16:12:42

that would be awesome

mbjarland16:12:47

you would have to add some kind of extension point to the instaparse bnf syntax I guess

aengelberg16:12:05

Maybe, or we don't allow extensions to the EBNF syntax and just let people make custom combinators for the combinator syntax

mbjarland16:12:49

ah, ok, hadn't grokked the combinators syntax until now

mbjarland16:12:23

right now I'm considering writing my own mini language for this log parsing, I could use instaparse to parse that language and then do custom, optimized parsing based on the format specification tree coming out from instaparse...so still useful

mbjarland17:12:35

hmm, how come I need to double escape the not-inclusive rule in the following grammmar:

(def my-p 
  (instaparse.core/parser 
    "spec = (field-spec <' '?>)+
     field-spec = <'['>name ' '* <':'> ' '* (width | not-inclusive | not-exclusive | rest)<']'>
     name = #'[^:]+'
     width = <'{'> #'\\d+' <'}'>
     not-inclusive = <'\\\\'> #'.'
     not-exclusive = <'/'> #'.'
     rest = '*'    
    "))

aengelberg17:12:25

you mean the '\\\\'?

mbjarland17:12:41

shouldn't two have been enough?

aengelberg17:12:09

because 1) you need to tell Clojure that you aren't escaping a character within a string 2) you need to tell Instaparse that you aren't escaping a character within a string combinator

mbjarland17:12:34

ok, missed point 2 there