There are three types of input data: [a] sequence of chars (standard parsing) [b] sequence of tokens (basically parsing, but after the lexing stage) [c] TREE of tokens (what we have) I can see how [a] == [b]; but I don't see how one define a generalization that covers both [b] and [c]. I'm not saying one does not exist, just that I'm not seeing it. Can you clarify on this ?
@hiredman: ^
a tree is a recursion of sequences, so just take the parsing over a sequence and apply it recursively
CFG is a language that defines patterns over sequences of tokens. What defines patterns over trees ?
if you make tokens terminals or another cfg
because the tokens are complex and also have a structure to be parsed
As far as I can see Cfg[Token] and Cfg[Cfg[Token]] both define patterns over sequences; can you point me at a research paper / book ? I need a more detailed exposition.
no,I've never seen this written up
but like, if you write out a grammar, say in bnf, everything is a terminal or a rule, and the terminals are the tokens
if instead of putting in, like, literal 'if' in the grammar as the token, you say, the terminal is actually a member of this set of sequences and describe the set as a cfg
I think the core issue here is Vec<Vec<T>> does not represent the full space of Tree<T>; so this idea of Cfg[Cfg[Token]] seems to not recognize Trees.
Can you give an example value that satisfies Tree<T> but not Vec<Vec<T>>?
a tree Tree<T> = Leaf<T> | Node<List<T>>
(to make up some terrible type notation)
errr
Tree<T> = Leaf<T> | Node<List<Tree<T>>>
So you walk the list in the node parsing it, where each token type is itself tree to be matched against some other parser
@afoltzm: All leaves in a Vec<T> have depth 1. All leaves in a Vec<Vec<T>> have depth 2.
@hiredman: Here I think is the problem Cfg[T] pattern language over Vec<T> Cfg[Cfg[T]] is a pattern language over Vec<Vec<T>>
that depends on exactly what the internals of CFG are
Vec<Vec<T>> is not a tree, so it is not what we have
edn/clojure data is a tree
so you have a recursive data type like Tree<T> = Leaf<T> | Node<List<Tree<T>>>
and one way to "parse" such a thing is to recursively parse everywhere the datatype does
so you can imagine a parsing that walks through List<String> building up a parse tree, and a fundamental operation of such a parser is doing an equality comparison of the strings in the list with the strings that are the terminals of the grammar
now say, what if you grammar is defined in some dsl/library that has some built in "functions" like "is-if-string?" that runs true when given the string "if"
so you go to your grammar and replace the if terminal with a "call" to is-if-string?, so instead of the parser doing an equality check it calls the predicate on the token
now change from string tokens to arbitrary data structures, and use a predicate like three-element-vector-all-0?
now imagine you have a higher order function 'member' that takes a set, and returns a predicate that returns true if its argument is in the set
what is a cfg? it is a description of a (possibly infinite) set that contains the strings of your language