2024-04-20 instaparse | Clojure Slack Archive

instaparse 2024-04-20

misha 2024-04-20T11:54:29.031559Z

I have some follow up to post above, but grounded to my use case "given some clojure code AST, how to plug in custom macros syntax as declaratively as possible": First, I chose AST to be flat vec of nodes, where node-id is an idx in this vec, this way you can refer to any node in any other node. Next, it seems that describing base and custom grammar (at least enough for IDE's "go to definition" to work) boils down to • enumerating node ids which introduce new local/global sym • mapping out where each node-id gets its scope from (some "scope-parent-node-id") There should be no limit to "where node gets its scope from within entire AST", but it is contained within a macro expr, which is a subtree of arbitrary depth, hence the need for as declarative as possible dsl describing "locals", and "scope inheritance". Destructuring does not cut it, because number and sequence of nodes varies, eg. (let [a b] a) and (let [a b c d] c a) or even (let [a ^:a ^:b ^:c ^:d e] a). Something like Spec-for-sequences (regex) seems to fit, but it is global. So I wrote inline spec-like regex, which accepts ast-predicates, does minimal validation (mismatch errors) and maps node ids to syms used in pattern:

;; form
"(let* [a :a b :b c :c] a c)"

;; pattern of triplets: op name pred [nested]?, ...)
;; ops:
;; 1,2,3... N as exact count
;; * ? + as 0-or-more, 0-or-1, 1-or-more
;; no branches (yet?), instead, to have analogue of s/alt just have 2 or more 'unrolled' patterns for same grammar.
;; allows groups, analogue of s/cat
[1 bindings node-vec?
 [* pairs :group
  [1 sym node-sym?
   1 expr node-any?]]
 * bodies node-any?]

this pattern applied to AST of form above returns

{bindings {2 {pairs {-1 {sym [3] expr [4]}
                     -2 {sym [5] expr [6]}
                     -3 {sym [7] expr [8]}}}}
 bodies   [9 10]}

and to declare locals and scope map:

{:locals sym
 :scope  {[bodies =] bindings   ;; each bodies node takes scope from bindings node
          bindings   pairs
          [pairs >]  sym        ;; each pairs node - from sym node, sorted left to right.
          sym        :in
          expr       :in}}

;; where = > < are directives how to thread scope through children: in parallel (default like in clojure itself), left-to-right, right-to-left.
;; and :in - is a scope of a parent of a grammar-node, (parent of a let* s-exp in this example) 
;; in case of pairs and syms - scope enters first sym, then goes to first paris, then pairs overwrites :in to itself, and repeats,
;; so it ends up like this: :in->sym1->pairs1->sym2->pairs2->...

where :locals sym just gonna be set of all sym ids #{3 5 7} and locals:

;; (calc-scope2 binds scope) -> {id parent-scope-id}:
{3  :in
 4  :in
 5  3
 6  3
 7  5
 8  5
 2  7
 9  2
 10 2}

which are just assoced onto relevant AST nodes. So to declare new grammar all you need is - some shared ast-node-predicates, - sexp pattern with node labels, - map (dag) of scope inheritance using thos labels, - and locals/globals using those labels. One general observation is: it helps to receive more regular tree, which tames amount of node combinations you have to handle in postprocessing. Here I made 2 such "simplifications": • groups (which are fake collections, s/cat) get fake (negative) ids, so that scope-calculator could process them as true collection nodes, which cut code almost in half. • pattern match returns [] for both cardinality one (? 1) and many (* + N) patterns: this is why match is {sym [3] expr [4]} instead of {sym 3 expr 4} - this reduced amount of code probably exponentially.

Clojurians Log v2

instaparse 2024-04-20