This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2022-05-24
Channels
- # aleph (5)
- # announcements (18)
- # babashka (3)
- # babashka-sci-dev (56)
- # beginners (56)
- # biff (5)
- # calva (27)
- # cider (20)
- # clj-commons (2)
- # clj-kondo (17)
- # cljsrn (18)
- # clojure (41)
- # clojure-europe (24)
- # clojure-nl (1)
- # clojure-serbia (1)
- # clojure-uk (15)
- # clojured (1)
- # clojurescript (40)
- # cursive (39)
- # datahike (2)
- # datalevin (4)
- # datascript (5)
- # emacs (23)
- # events (2)
- # figwheel-main (3)
- # inf-clojure (1)
- # instaparse (23)
- # introduce-yourself (3)
- # jobs (3)
- # jobs-discuss (13)
- # joyride (1)
- # juxt (10)
- # malli (21)
- # nbb (29)
- # off-topic (18)
- # pathom (29)
- # polylith (11)
- # project-updates (1)
- # proletarian (1)
- # rdf (2)
- # re-frame (4)
- # reitit (2)
- # releases (2)
- # remote-jobs (1)
- # shadow-cljs (52)
- # tools-deps (57)
- # xtdb (32)
I'm trying to parse a rule like this: (parser "EOL ::= [#xD#xA]+")
, but it blows up with a parse error:
EOL ::= [#xD#xA]+
^
Expected one of:
!
&
ε
eps
EPSILON
epsilon
Epsilon
<
(
{
[
#"#\"[^\"\\]*(?:\\.[^\"\\]*)*\"(?x) #Double-quoted regexp"
#"#'[^'\\]*(?:\\.[^'\\]*)*'(?x) #Single-quoted regexp"
#"\"[^\"\\]*(?:\\.[^\"\\]*)*\"(?x) #Double-quoted string"
#"'[^'\\]*(?:\\.[^'\\]*)*'(?x) #Single-quoted string"
(*
#"[^, \r\t\n<>(){}\[\]+*?:=|'"#&!;./]+(?x) #Non-terminal"
I'm going off of this EBNF syntax: https://www.w3.org/TR/REC-xml/#sec-notation
"#xN - where N is a hexadecimal integer, the expression matches the character whose number (code point) in ISO/IEC 10646 is N. The number of leading zeros in the #xN form is insignificant."
Do I need to translate that syntax into some other representation? Is there one in particular that I should choose?
instaparse uses clojure's syntax for regexes, so it expects # to be the start of a regex, maybe \ to escape it (would have to be \\ in a string literal)
These are the code points for cr lf, I believe, maybe I need to translate those into the the clojure versions
ah, yes, well even if # didn't throw the above error, the syntax they use for matching octets is not a thing
yeah, the problem is that #xN
is a pseudo-syntax that the XML specification may have invented for its own grammar, to help clarify the nuances of the character code points. But Instaparse doesn’t know how to interpret that as an actual parser.
I think this should work in instaparse:
EOL ::= "\u000D" | "\u000A"
actually, this might not work if you’re slurping the grammar from a file and passing that into instaparse. the \u000A
thing is a Clojure reader feature, not an instaparse feature
Java regexes also support referring to chars as code points, which means you can use the Instaparse regex feature as well:
EOL ::= #"[\\x0D\\x0A]"
And changing the double quote to a single quote makes it a little less messy: (grammar/parser "EOL ::= #'[\\x0D\\x0A]'")
no problem