This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2016-08-30
Channels
- # admin-announcements (1)
- # aws (32)
- # bangalore-clj (1)
- # beginners (2)
- # boot (137)
- # cider (2)
- # clara (1)
- # cljs-dev (39)
- # cljsrn (20)
- # clojure (268)
- # clojure-berlin (20)
- # clojure-canada (37)
- # clojure-dev (8)
- # clojure-gamedev (6)
- # clojure-norway (2)
- # clojure-russia (55)
- # clojure-spec (130)
- # clojure-uk (39)
- # clojurebridge (1)
- # clojurescript (102)
- # cursive (20)
- # datomic (231)
- # editors (5)
- # editors-rus (8)
- # events (5)
- # funcool (12)
- # hoplon (31)
- # instaparse (57)
- # jobs (9)
- # lein-figwheel (4)
- # off-topic (2)
- # om (8)
- # om-next (30)
- # onyx (241)
- # planck (6)
- # protorepl (4)
- # re-frame (115)
- # reagent (7)
- # rum (9)
- # schema (1)
- # test-check (9)
- # untangled (24)
- # yada (20)
I am trying to write a simple grammar that parses comments:
/* some text */
, is there a way in instaparse to say any character?
e.g.
"comment = ‘/*’ .* ‘*/‘"
@andrei Instaparse doesn't have a special character for that, but you can use regular expressions to cover any character
e.g. comment = '/*' #'[\\s\\S]'* '*/'
(`#"[\s\S]"` is my personal favorite way to match any character in a regex)
@andrei: Yeah, you'll want something like this:
"comment = <'/*'> #'.*' <'*/'>"
My version hides the comment tokens, though @aengelberg's regexp might be more appropriate.@aengelberg @seylerius thank you for the suggestions. I think I got a bit mislead by the source code, https://github.com/Engelberg/instaparse/blob/master/src/instaparse/abnf.clj#L19-L40 I thought there are some defaults in instaparse
but now reading through the doc strings, these are only to parse the grammar itself https://github.com/Engelberg/instaparse/blob/master/src/instaparse/abnf.clj#L2
a couple things I see in @seylerius's solution:
1) .
in a regex doesn't include newlines
2) .*
will greedily match past the */
and won't be able to parse the end of a comment
@andrei Sorry for the misleading code. Those constants are available but only to the ABNF format.
EBNF is the default
@andrei A point to keep in mind with @aengelberg's solution is that you'll need to condense the individual characters of the output.
@seylerius @aengelberg is there a way for specifying in instaparse to group matches together, s.t. one doesn’t need to condense the matches?
yeah, thanks for clarifying that @seylerius
You'll get output like [:comment "f" "o" "o" " " "b" "a" "r"]
from input like /*foo bar*/
@andrei The official specification for ABNF is more strict and specific than EBNF, and it dictates that those constants are available. EBNF is more of an ambiguous mashup of a variety of standards we were able to find on the internet
So there are no constants in EBNF, since none of the EBNF resources we found seemed to indicate such
And remember to wrap your comment tokens in <>
like I did, so you don't save the markup itself.
Sadly there is no grammar direct way to concat the strings
I am using smth like this for strings
<string> = dqoute #'([^"\\]|\\.)*' dqoute
<dqoute> = <'\"'>
it depends on the size of the file. Probably actually creating all those individual strings is going to be the bottleneck rather than concatenating them later
I must admit I was lead astray by regexps vs transforms which is more efficient - although I think its a very premature optimisation
A regex is a sensible solution if you can get it right 🙂
My first thought is to do a negative lookahead for */
as part of the regex
so more reg exp magic for me to look into. to give a bit more context I am playing around with parsing localizable strings.
/* This is a comment */
"hello" = "Hello!";
/* This is another comment */
"click_button" = "Click";
/* Title bar, prints the number of selected products (The translation should be short due to the limit of 100 characters for the title of the mobile app) */
"bar_print_$_selected_products" = "You Selected %@ Products”;
@aengelberg @seylerius thank you for your help, so far I enjoyed using instaparse. is cool that I can use some things that I learned in college to do some useful things
although I must say that I need to re-learn things about parsers and defining grammars
@seylerius I meant a regex negative lookahead, i.e. #".*(?!=/\*)"
or something
@andrei glad you're having fun! feel free to ask here if you have any more questions
@aengelberg: That's what I thought. It winds up eating the end-token in the .*
and passes the negative lookahead anyway. I was fighting that with the headline parser in organum over the weekend.
oh, I guess the regex would pass, saying "here's a sequence of characters (including /*
), and look, there is not a /*
*after* these characters!"
so maybe #"((?!/\*).)*"
that would generate a bunch of match groups though due to the ()
Ach. I need to drive back to the store; I'm done with this client. Check in with y'all in about ten.