Fork me on GitHub
#clojure-spec
<
2021-11-04
>
qqq02:11:19

There are three types of input data: [a] sequence of chars (standard parsing) [b] sequence of tokens (basically parsing, but after the lexing stage) [c] TREE of tokens (what we have) I can see how [a] == [b]; but I don't see how one define a generalization that covers both [b] and [c]. I'm not saying one does not exist, just that I'm not seeing it. Can you clarify on this ?

hiredman02:11:58

a tree is a recursion of sequences, so just take the parsing over a sequence and apply it recursively

qqq02:11:48

CFG is a language that defines patterns over sequences of tokens. What defines patterns over trees ?

hiredman02:11:29

if you make tokens terminals or another cfg

hiredman02:11:56

because the tokens are complex and also have a structure to be parsed

qqq03:11:15

As far as I can see Cfg[Token] and Cfg[Cfg[Token]] both define patterns over sequences; can you point me at a research paper / book ? I need a more detailed exposition.

hiredman03:11:10

no,I've never seen this written up

hiredman03:11:57

but like, if you write out a grammar, say in bnf, everything is a terminal or a rule, and the terminals are the tokens

hiredman03:11:01

if instead of putting in, like, literal 'if' in the grammar as the token, you say, the terminal is actually a member of this set of sequences and describe the set as a cfg

qqq03:11:53

I think the core issue here is Vec<Vec<T>> does not represent the full space of Tree<T>; so this idea of Cfg[Cfg[Token]] seems to not recognize Trees.

respatialized16:11:42

Can you give an example value that satisfies Tree<T> but not Vec<Vec<T>>?

hiredman04:11:54

a tree Tree<T> = Leaf<T> | Node<List<T>>

hiredman04:11:10

(to make up some terrible type notation)

hiredman04:11:02

Tree<T> = Leaf<T> | Node<List<Tree<T>>>

hiredman04:11:49

So you walk the list in the node parsing it, where each token type is itself tree to be matched against some other parser

qqq23:11:03

@afoltzm: All leaves in a Vec<T> have depth 1. All leaves in a Vec<Vec<T>> have depth 2.

qqq23:11:12

@hiredman: Here I think is the problem Cfg[T] pattern language over Vec<T> Cfg[Cfg[T]] is a pattern language over Vec<Vec<T>>

hiredman23:11:59

that depends on exactly what the internals of CFG are

hiredman23:11:30

Vec<Vec<T>> is not a tree, so it is not what we have

hiredman23:11:39

edn/clojure data is a tree

hiredman23:11:52

so you have a recursive data type like Tree<T> = Leaf<T> | Node<List<Tree<T>>>

hiredman23:11:31

and one way to "parse" such a thing is to recursively parse everywhere the datatype does

hiredman23:11:37

so you can imagine a parsing that walks through List<String> building up a parse tree, and a fundamental operation of such a parser is doing an equality comparison of the strings in the list with the strings that are the terminals of the grammar

hiredman23:11:26

now say, what if you grammar is defined in some dsl/library that has some built in "functions" like "is-if-string?" that runs true when given the string "if"

hiredman23:11:12

so you go to your grammar and replace the if terminal with a "call" to is-if-string?, so instead of the parser doing an equality check it calls the predicate on the token

hiredman23:11:18

now change from string tokens to arbitrary data structures, and use a predicate like three-element-vector-all-0?

hiredman23:11:01

now imagine you have a higher order function 'member' that takes a set, and returns a predicate that returns true if its argument is in the set

hiredman23:11:58

what is a cfg? it is a description of a (possibly infinite) set that contains the strings of your language