Fork me on GitHub
#instaparse
<
2016-07-04
>
turbopape01:07:46

What's the status of cljs support ?

turbopape01:07:49

Is it still living as a fork?

aengelberg01:07:56

The only cljs support still lives in lbradstreet/instaparse-cljs

aengelberg01:07:57

But I'm currently in the process of rewriting instaparse-cljs into a form that we'd be willing to accept back into upstream, now that cljsee exists

aengelberg07:07:31

@seylerius: Here's a grammar that parses exponents like you were you asking:

boot.user=> (def p (insta/parser "
<S> = ows (exponent ows)+
<exponent> = token <'^'> super
super = token | <'{'> token <'}'>
<token> = #'[^\\s\\^{}]+'
<ows> = <#'\\s*'>
"))
#'boot.user/p
boot.user=> (p "foo^2 x^{x+1}")
("foo" [:super "2"] "x" [:super "x+1"])
This parser is pretty naive about the range of possible inputs, since I'm not totally sure myself what that range of inputs is in your use case.

seylerius16:07:13

Another question: * / + = & ~ can appear in singles without being tokens. How would you represent that? Current parser: http://sprunge.us/GNDe

seylerius16:07:58

@aengelberg: What I have will do for the moment, but it's a part of the spec I'd like to meet eventually.

Andy17:07:26

Hi, We switched recently for parsing user input using plain regex to instaparse. Code looks way better. However there are two corner cases where I am not sure what would be idiomatic way: 1) parsing of certain domain of inputs should result on noop. Our current solution is:

"sentence = define / explain / help / catchall
<<skipped definitions>>
 catchall = #'(.|[\n\r])*'"
with an intention to just ignore last part during transformation : catchall (fn [_] nil) Now I wonder if there is another way to catch this case and ignore without using exceptions. 2)`'(.|[\n\r])*'` comes with | which on JVM leads on recursion and might result in stack overflow. In fact it happened one to us. Is there a better way to write catchall which would account for anything including \n and \r.

aengelberg17:07:05

@happy.lisper for catchall you could do #'[\s\S]*'

aengelberg17:07:16

So your use case is: "Parse the entire string as a define, an explain, or a help, but if that doesn't work then return nil"?

aengelberg17:07:43

Because you could just run the parse and a transform, then check (insta/failure? result)

Andy17:07:52

yes, where nil is just a signal to ignore the input.

aengelberg17:07:54

(def p (insta/parser ...))
(let [result (p input-string)
      transformed (insta/transform p {...})]
  (when-not (insta/failure? transformed)
    transformed))

aengelberg17:07:12

Note that insta/transform is specifically designed to pass through failures

Andy17:07:13

Let me consider that 🙂.

aengelberg17:07:50

@seylerius: Given an input ~a ~b, how do you know the a and b are to be parsed as individual ~'s, as opposed to a code string of "a " followed by "b"?

seylerius17:07:06

@aengelberg: If I'm reading this correctly, the characters touching the inside of the tokens need to be alphanumeric, or at least non-whitespace.

aengelberg17:07:43

so *a b c* shouldn't be allowed?

aengelberg17:07:24

the current grammar that I suggested would allow that. Just trying to get a sense of the range of inputs so I can help design a parser accordingly

seylerius17:07:24

*foo* *bar* ➡️ [:b "foo" "bar"] foo* bar* ➡️ "foo* bar*"

aengelberg17:07:50

for the first example do you mean [:b "foo"] [:b "bar"]?

aengelberg17:07:16

is there a guarantee that *a**b* won't happen?

seylerius17:07:46

@aengelberg: Yes. And guarantee? No. Ambiguity in the spec we can lock to an interpretation? Yes.

seylerius17:07:17

We basically get to decide if that's a pair of bold characters or a flat string we'll leave be.

seylerius17:07:28

It would only likely happen as a typo.

seylerius17:07:41

(Or a stupid user)

seylerius17:07:07

@aengelberg: I'm basically upgrading organum. Sample org file: http://sprunge.us/KBbL

aengelberg17:07:01

hmm, thinking through how to enforce alphanumeric chars on the insides of tokens.

aengelberg17:07:22

doing a "lookbehind" on the last * is nontrivial.

seylerius18:07:16

What if I stripped leading and trailing whitespace before parsing, and modified the base string rule to start and end alphanumeric? Would that be easier?

seylerius18:07:37

But, no, that wouldn't quite work.

seylerius18:07:29

@aengelberg: Will the parser ignore escaped tokens, like \*?

seylerius18:07:48

Ach. Clojure doesn't like \* in a string

seylerius18:07:43

@aengelberg: Is here any way to mark tokens to not be parsed?

Andy18:07:35

would angle brackets <> to hide parsed elements work?

aengelberg18:07:29

@seylerius you'd have to do \\* if inside a Clojure string

aengelberg18:07:54

the goal is to avoid parsing *a * as [:b "a "]

seylerius18:07:34

@aengelberg: Anything special I have to do to mark that? I just tried parsing \\*foo\\* and got ("\\" [:b "foo\\"])

aengelberg18:07:22

instaparse doesn't automatically handle backslashes in any special way besides what has been defined in your grammar.

seylerius18:07:42

Okay. How do you define a simple backslash replacement in this type of grammar, then?

aengelberg18:07:59

Maybe replace <string> with:

<string> = '\\\\*' | #'[^*/_+=~^_\\\\]+'
user> (inline-markup "a\\* b")
("a" "\\*" " b")

aengelberg18:07:17

Pretty messy, I know. (four backslashes :face_with_rolling_eyes:)

aengelberg18:07:03

I don't know if this solves your problem though; you don't want to escape *'s in every ** My Subsection text, do you?

aengelberg18:07:13

sorry if I'm a bit unhelpful; phasing in and out of AFK

seylerius18:07:38

I'm thinking I'm just going to tell users that if they want a plain * they have to escape it.

seylerius18:07:23

Headlines are already handled by the time this stage of parsing is invoked, so those won't be an issue.

seylerius18:07:21

And your special case of *a**b* is apparently already readily converted to ([:b "a"] [:b "b"])

seylerius20:07:06

@aengelberg: Separate (earlier stage) parser: Is it possible (other than by having respective rules for #'^* ', #'^** ', #'^*** ', etc) to easily produce h1, h2, h3, etc?

seylerius20:07:25

Actually, yeah. Just don't hide the token, and I can put that through a counter after the fact.