Fork me on GitHub
#instaparse
<
2020-02-21
>
mmeix09:02:29

New to Instaparse and wrapping my head around grammars: how would I write a grammar that can do nested tag pairs like in xml: "<p><span>text</span></p>" => [:p [:span text]] ?

aengelberg00:02:56

it’s possible to parse XML hierarchies into Clojure data, however I don’t think you can enforce that the tags must be matching.

aengelberg00:02:36

You can enforce that manually with your own custom logic after the fact, just not as part of the parser.

mmeix17:02:00

So I would just trust, that tags are properly matched/paired/nested.

mmeix17:02:46

and take each closing tag as the next needed

manutter5112:02:44

caveat: I haven’t had my coffee yet, but the basic idea is that you say something like “a BLOCK element is a P or a DIV or a TABLE (etc), an INLINE element is a SPAN or a B or TEXT (etc),” and then say “a P element is the literal string ‘<P>’ or ‘<p>’ followed by zero or more INLINE elements, followed by the literal string ‘</P>’ or ‘</p>’.” And similarly with SPAN.

mmeix13:02:36

ah! thanks ... that should start it

manutter5113:02:52

The other caveat is that Instaparse is incredibly fun to work with and may be addictive. 😉

mmeix14:02:29

Confirmed! 😁