clojure-spec

qqq 2021-11-04T02:54:19.039400Z

There are three types of input data: [a] sequence of chars (standard parsing) [b] sequence of tokens (basically parsing, but after the lexing stage) [c] TREE of tokens (what we have) I can see how [a] == [b]; but I don't see how one define a generalization that covers both [b] and [c]. I'm not saying one does not exist, just that I'm not seeing it. Can you clarify on this ?

qqq 2021-11-04T02:54:23.039600Z

@hiredman: ^

2021-11-04T02:55:58.040800Z

a tree is a recursion of sequences, so just take the parsing over a sequence and apply it recursively

qqq 2021-11-04T02:57:48.041400Z

CFG is a language that defines patterns over sequences of tokens. What defines patterns over trees ?

2021-11-04T02:58:29.042200Z

if you make tokens terminals or another cfg

2021-11-04T02:58:56.042600Z

because the tokens are complex and also have a structure to be parsed

qqq 2021-11-04T03:00:15.043600Z

As far as I can see Cfg[Token] and Cfg[Cfg[Token]] both define patterns over sequences; can you point me at a research paper / book ? I need a more detailed exposition.

2021-11-04T03:01:10.044300Z

no,I've never seen this written up

2021-11-04T03:01:57.045100Z

but like, if you write out a grammar, say in bnf, everything is a terminal or a rule, and the terminals are the tokens

2021-11-04T03:03:01.046200Z

if instead of putting in, like, literal 'if' in the grammar as the token, you say, the terminal is actually a member of this set of sequences and describe the set as a cfg

qqq 2021-11-04T03:11:53.047300Z

I think the core issue here is Vec<Vec<T>> does not represent the full space of Tree<T>; so this idea of Cfg[Cfg[Token]] seems to not recognize Trees.

respatialized 2021-11-04T16:20:42.053300Z

Can you give an example value that satisfies Tree<T> but not Vec<Vec<T>>?

2021-11-04T04:12:54.049100Z

a tree Tree<T> = Leaf<T> | Node<List<T>>

2021-11-04T04:13:10.049400Z

(to make up some terrible type notation)

2021-11-04T04:13:40.050100Z

errr

2021-11-04T04:14:02.050500Z

Tree<T> = Leaf<T> | Node<List<Tree<T>>>

2021-11-04T04:14:49.051300Z

So you walk the list in the node parsing it, where each token type is itself tree to be matched against some other parser

qqq 2021-11-04T23:48:03.054100Z

@afoltzm: All leaves in a Vec<T> have depth 1. All leaves in a Vec<Vec<T>> have depth 2.

qqq 2021-11-04T23:49:12.055300Z

@hiredman: Here I think is the problem Cfg[T] pattern language over Vec<T> Cfg[Cfg[T]] is a pattern language over Vec<Vec<T>>

2021-11-04T23:50:59.056Z

that depends on exactly what the internals of CFG are

2021-11-04T23:51:30.056400Z

Vec<Vec<T>> is not a tree, so it is not what we have

2021-11-04T23:51:39.056700Z

edn/clojure data is a tree

2021-11-04T23:51:52.057Z

so you have a recursive data type like Tree<T> = Leaf<T> | Node<List<Tree<T>>>

2021-11-04T23:52:31.057600Z

and one way to "parse" such a thing is to recursively parse everywhere the datatype does

2021-11-04T23:54:37.059200Z

so you can imagine a parsing that walks through List<String> building up a parse tree, and a fundamental operation of such a parser is doing an equality comparison of the strings in the list with the strings that are the terminals of the grammar

2021-11-04T23:55:26.060100Z

now say, what if you grammar is defined in some dsl/library that has some built in "functions" like "is-if-string?" that runs true when given the string "if"

2021-11-04T23:56:12.061Z

so you go to your grammar and replace the if terminal with a "call" to is-if-string?, so instead of the parser doing an equality check it calls the predicate on the token

2021-11-04T23:57:18.062Z

now change from string tokens to arbitrary data structures, and use a predicate like three-element-vector-all-0?

2021-11-04T23:59:01.063600Z

now imagine you have a higher order function 'member' that takes a set, and returns a predicate that returns true if its argument is in the set

2021-11-04T23:59:58.064800Z

what is a cfg? it is a description of a (possibly infinite) set that contains the strings of your language