This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2016-07-04
Channels
- # admin-announcements (14)
- # aleph (3)
- # beginners (75)
- # boot (95)
- # carry (4)
- # cider (23)
- # clojure (39)
- # clojure-android (3)
- # clojure-brasil (2)
- # clojure-dev (17)
- # clojure-gamedev (1)
- # clojure-mexico (12)
- # clojure-poland (12)
- # clojure-romania (1)
- # clojure-russia (10)
- # clojure-spec (8)
- # clojure-uk (36)
- # clojurescript (34)
- # core-async (4)
- # datomic (40)
- # emacs (1)
- # events (7)
- # hoplon (119)
- # instaparse (52)
- # keechma (71)
- # mount (4)
- # off-topic (9)
- # om (4)
- # onyx (3)
- # other-languages (23)
- # protorepl (3)
- # re-frame (9)
- # reagent (26)
- # rethinkdb (5)
- # spacemacs (2)
- # testing (1)
- # yada (1)
The only cljs support still lives in lbradstreet/instaparse-cljs
But I'm currently in the process of rewriting instaparse-cljs into a form that we'd be willing to accept back into upstream, now that cljsee exists
@seylerius: Here's a grammar that parses exponents like you were you asking:
boot.user=> (def p (insta/parser "
<S> = ows (exponent ows)+
<exponent> = token <'^'> super
super = token | <'{'> token <'}'>
<token> = #'[^\\s\\^{}]+'
<ows> = <#'\\s*'>
"))
#'boot.user/p
boot.user=> (p "foo^2 x^{x+1}")
("foo" [:super "2"] "x" [:super "x+1"])
This parser is pretty naive about the range of possible inputs, since I'm not totally sure myself what that range of inputs is in your use case.Another question: *
/
+
=
& ~
can appear in singles without being tokens. How would you represent that? Current parser: http://sprunge.us/GNDe
@aengelberg: What I have will do for the moment, but it's a part of the spec I'd like to meet eventually.
Hi, We switched recently for parsing user input using plain regex to instaparse. Code looks way better. However there are two corner cases where I am not sure what would be idiomatic way: 1) parsing of certain domain of inputs should result on noop. Our current solution is:
"sentence = define / explain / help / catchall
<<skipped definitions>>
catchall = #'(.|[\n\r])*'"
with an intention to just ignore last part during transformation : catchall (fn [_] nil)
Now I wonder if there is another way to catch this case and ignore without using exceptions.
2)`'(.|[\n\r])*'` comes with |
which on JVM leads on recursion and might result in stack overflow. In fact it happened one to us. Is there a better way to write catchall
which would account for anything including \n
and \r
.@happy.lisper for catchall you could do #'[\s\S]*'
So your use case is: "Parse the entire string as a define
, an explain
, or a help
, but if that doesn't work then return nil"?
Because you could just run the parse and a transform, then check (insta/failure? result)
(def p (insta/parser ...))
(let [result (p input-string)
transformed (insta/transform p {...})]
(when-not (insta/failure? transformed)
transformed))
Note that insta/transform
is specifically designed to pass through failures
@seylerius: Given an input ~a ~b
, how do you know the a
and b
are to be parsed as individual ~
's, as opposed to a code string of "a "
followed by "b"
?
@aengelberg: If I'm reading this correctly, the characters touching the inside of the tokens need to be alphanumeric, or at least non-whitespace.
so *a b c*
shouldn't be allowed?
the current grammar that I suggested would allow that. Just trying to get a sense of the range of inputs so I can help design a parser accordingly
@aengelberg: that make sense?
for the first example do you mean [:b "foo"] [:b "bar"]
?
is there a guarantee that *a**b*
won't happen?
@aengelberg: Yes. And guarantee? No. Ambiguity in the spec we can lock to an interpretation? Yes.
We basically get to decide if that's a pair of bold characters or a flat string we'll leave be.
@aengelberg: I'm basically upgrading organum. Sample org file: http://sprunge.us/KBbL
hmm, thinking through how to enforce alphanumeric chars on the insides of tokens.
doing a "lookbehind" on the last *
is nontrivial.
What if I stripped leading and trailing whitespace before parsing, and modified the base string
rule to start and end alphanumeric? Would that be easier?
@aengelberg: Will the parser ignore escaped tokens, like \*
?
@aengelberg: Is here any way to mark tokens to not be parsed?
@seylerius you'd have to do \\*
if inside a Clojure string
the goal is to avoid parsing *a *
as [:b "a "]
@aengelberg: Anything special I have to do to mark that? I just tried parsing \\*foo\\*
and got ("\\" [:b "foo\\"])
instaparse doesn't automatically handle backslashes in any special way besides what has been defined in your grammar.
Okay. How do you define a simple backslash replacement in this type of grammar, then?
Maybe replace <string>
with:
<string> = '\\\\*' | #'[^*/_+=~^_\\\\]+'
user> (inline-markup "a\\* b")
("a" "\\*" " b")
Pretty messy, I know. (four backslashes :face_with_rolling_eyes:)
I don't know if this solves your problem though; you don't want to escape *
's in every ** My Subsection
text, do you?
sorry if I'm a bit unhelpful; phasing in and out of AFK
I'm thinking I'm just going to tell users that if they want a plain *
they have to escape it.
Headlines are already handled by the time this stage of parsing is invoked, so those won't be an issue.
And your special case of *a**b*
is apparently already readily converted to ([:b "a"] [:b "b"])
@aengelberg: Separate (earlier stage) parser: Is it possible (other than by having respective rules for #'^* '
, #'^** '
, #'^*** '
, etc) to easily produce h1
, h2
, h3
, etc?