Fork me on GitHub
#instaparse
<
2016-12-31
>
gfredericks03:12:27

instaparse requires keywords for the names of the whatchamacallits?

gfredericks03:12:55

I think I might be using instaparse in a weird enough way for that to be a very mild problem

gfredericks03:12:13

because I have to gensym the names and so it's a memory leak

seylerius04:12:01

@gfredericks It outputs either hiccup or enlive notation, so yes it probably would want keywords in reverse.

aengelberg09:12:28

@gfredericks:

(def all-keywords-ever (map keyword (range)))

;; each time you dynamically create a parser
(let [my-syms ...
kws (zipmap my-syms all-keywords-ever)]
...)

aengelberg09:12:40

That might be a way to conserve on keywords

aengelberg09:12:21

Or do a string replace in the grammar to substitute non terminals with reusable symbols, then postwalk the resulting tree to convert back

gfredericks14:12:50

I'm using the combinators, so it shouldn't be too hard to do something like that if I decide this matters

zmaril19:12:39

@gfredericks @aengelberg if we can actually get generating from grammars going I'd still be really stoked

zmaril19:12:36

I've been working on https://github.com/zmaril/instaparse-c the past few weeks and am getting within spitting distance of doing some fun stuff.

zmaril19:12:09

It can basically parse C at this point and I'm working on finishing the macro preprocessor now.

zmaril19:12:17

The goal is to get the output into datascript and queryable. But a side product of this is that if you have something that can generate strings from grammars then we already have something that can produce c programs (sans macros).

gfredericks19:12:30

@zmaril do you or anybody know if all instaparse grammars are implemented using the combinators?

gfredericks19:12:39

s/grammars/parser/

zmaril19:12:55

Yes they should be

zmaril19:12:37

My understanding is that the ebnf notation that everybody uses is actually parsed by a parser expressed in the combinators that transforms the output into combinators

gfredericks20:12:11

I just glanced at the combinator list -- I think only the lookaheads are problematic, but that's probably a big deal for sophisticated parsers

gfredericks20:12:29

so...oh well.

zmaril20:12:56

how does one express negation in generators now?

gfredericks20:12:09

you could implement them with gen/such-that but the generator would fail if the lookahead condition is unlikely to pass by chance

gfredericks20:12:37

I have no how that would play out IRL

zmaril20:12:43

That should be fine then. For the parsers I write lookahead is typically used to implement reserved keywords.

zmaril20:12:03

I've never used positive lookahead actually now that I think about it

gfredericks20:12:18

when I made the regex→string generator I just decided not to support look[ahead|behind] for the same reason

zmaril20:12:38

It's one of those things that is academic to me at this point

zmaril20:12:05

I'm pretty sure that 99% gen/such-that of the time would be fine

gfredericks20:12:29

it might not be too hard to throw together a PoC

gfredericks20:12:42

in fact that would potentially be useful for what I'm working on right now

zmaril20:12:39

yeah, I think that would fit really well and mirror what spec is doing

zmaril20:12:52

I've been using spec/conform the same way I use instaparse and it works really well

zmaril20:12:12

So I imagine we could use generators the same way spec does and it would work well (fingers crossed)

gfredericks20:12:32

😂 I just realized that it would require using string-from-regex from test.chuck to support regexes in the grammars, and string-from-regex uses instaparse to parse the regex.

zmaril20:12:10

that was the thing that was holding me up actually

zmaril20:12:14

was that I didn't want to mess with regexs

aengelberg20:12:30

just catching up

aengelberg20:12:30

After I wrote "instagenerate" I realized going the generator route (as opposed to core.logic) would probably be easier, despite the lookahead such-that problem

aengelberg20:12:41

But what do you want to do about hide-tags?

zmaril20:12:10

I think I have an idea, h/o

zmaril20:12:44

well, hmmm what is the problem you see with hide-tags?

aengelberg20:12:14

It depends on what you expect the "input" to the generator to be

aengelberg20:12:24

a parse tree still?

gfredericks20:12:34

it'd be the combinator

gfredericks20:12:46

it would generate totally random parsable things

gfredericks20:12:53

not based on same partial input

aengelberg20:12:24

ok, in that case I don't really have a problem with hide tags despite just waking up

zmaril20:12:40

I think if we got something going that just took a grammar and gave back random strings, that would be a good first step

aengelberg20:12:50

part of why I did core.logic in instagenerate is @zmaril's initial request to go from partial input -> parseable strings, so I felt the need to put in the sophistication of logic programming as a general solver for all cases

zmaril20:12:15

oh, if we want to do partial input, we can provide skeletons with places to start generating from

zmaril20:12:41

then we just walk the skeleton and generate random strings at the indicated places

zmaril20:12:04

still not fully general but better

zmaril20:12:19

and then we could restrict the grammar inside the combinator somehow

aengelberg20:12:48

(def p (insta/parser "
S = A B A | B A B
<A> ('a' <'c'> 'b')+
<B> ('b' 'a')+
"))

(generate p [:S "a" "b" "b" "a" "a" "b"])
=> ("acbbaacb")

aengelberg20:12:35

seems hard to performantly solve generally

zmaril20:12:34

who said anything about performance

aengelberg20:12:39

🙂 fair enough

aengelberg20:12:00

but a generator approach using such-that may never complete on a large enough grammar

zmaril20:12:31

cross that bridge when we get there

zmaril20:12:48

computers are like really fast

zmaril20:12:20

this is more of a what's possible idea than a production thing

aengelberg20:12:00

let me know if I can help out in whichever path you decide to try out

gfredericks20:12:51

yeah generators aren't generally for production stuff

gfredericks20:12:20

I want a combinator that doesn't match anything

gfredericks20:12:41

I thought maybe (combo/alt) but that returns ε

zmaril20:12:11

(gen/such-that (constantly false)) or something?

gfredericks20:12:18

a combinator, not a generator

zmaril20:12:22

oh right sorry

gfredericks20:12:38

I guess I can do negative lookahead with epsilon?

zmaril20:12:52

or a really unlikely string?

zmaril20:12:27

like (string "THISWILLNEVERBEMATCHEDHOPEFULLY")

zmaril20:12:30

we're not fancy here

gfredericks20:12:38

(string (str (java.util.UUID/randomUUID)))

zmaril20:12:56

that works!

gfredericks20:12:24

I have an alternate thing in my codebase that could be called a parser, but instaparse also has something by that name so I called it a parsifier instead

gfredericks20:12:32

and it's hard to remember that word because it could also have been parsinator

zmaril20:12:17

(defn enlive-output->datascript-datums [m]
 (if-not (map? m)
    {:type :value :value m}
    (as-> m $
        (assoc $ :meta (meta m))
        (assoc $ :db/id (d/tempid :mcc))
        (transform [:content ALL] enlive-output->datascript-datums $))))
This will take enlive output and make it so you can query it from datascript

gfredericks20:12:24

does instaparse use its own regex engine?

gfredericks20:12:40

I just got a misparse where the thing matches the regex but instaparse disagrees

zmaril20:12:42

depends on java if I recall

gfredericks20:12:52

and reordering a disjunction in the regex fixes it

gfredericks20:12:10

this is the instparse-cljs thing in particular, but still on the jvm

zmaril20:12:16

check if instaparse passes any flags in

zmaril20:12:18

"0/2" parses

zmaril20:12:44

can you add in some parens to the second part to clarify your intent

gfredericks20:12:59

"0/2" is not supposed to parse o_O

gfredericks21:12:34

I see that's my fault though

aengelberg21:12:41

I second !epsilon as the "don't parse"

aengelberg22:12:29

also instaparse fails on infinite loop grammars, so this might work

never-succeed = never-succeed
(then use never-succeed wherever)

gfredericks22:12:58

@aengelberg do you think the current behavior of (combo/alt) is bad/weird?

gfredericks22:12:44

my hunch is that According To Math it should either throw or not match anything

aengelberg22:12:01

yeah I agree with your instinct. Not really sure what the thinking was in that design.

gfredericks22:12:17

my argument is that because (combo/alt p) probably does not match ε, neither should (combo/alt)

aengelberg22:12:23

Maybe since "don't parse anything" isn't really a common use case

gfredericks22:12:33

you shouldn't parse more things by removing an arg from combo/alt

gfredericks22:12:03

yeah I always end up finding the uncommon use cases

gfredericks22:12:25

for a while every time I tried to use CLJS I ended up creating a jira ticket

aengelberg22:12:55

#gobigorgohome

aengelberg22:12:09

I think I know why your parser is failing

aengelberg22:12:48

The regex for the denominator, when given "25" as input, may arbitrarily decide to match either "2" or "25"

aengelberg22:12:04

In instaparse, whatever the regex decides is the one and only possible parse

aengelberg22:12:53

user=> (re-matches #"[2-9]|[1-9][0-9]+" "25")
"25"
user=> (re-seq #"[2-9]|[1-9][0-9]+" "25")
("2" "5")
user=> (re-find #"[2-9]|[1-9][0-9]+" "25")
"2"

gfredericks22:12:33

oh it's about re-matches vs re-find?

gfredericks22:12:47

oh I think I see

aengelberg22:12:04

you could instead do #"[2-9]" | #"[1-9][0-9]+"

aengelberg22:12:22

If you move logic from regexes into instaparse, you get flexibility at the cost of speed

gfredericks22:12:10

so the fact that I fixed it by rearranging the regex is sort of an implementation detail I guess?

aengelberg22:12:02

Yes, so I would call rearranging the regex an improper solution

aengelberg22:12:25

but #"[2-9]" | #"[1-9][0-9]+" is proper

gfredericks22:12:10

okay fine I'll switch it 😛