Fork me on GitHub
#specter
<
2018-05-18
>
sophiago22:05:22

Question: I have a simple transform expression with a recursive navigator very similar to the one in the example that I'm using in a macro to navigate to every symbol in an AST and replace parts of certain ones (unfortunately converting to and from strings and using regex since I only need to replace part of these symbols). I'd also like it to jump one level up in the AST from each match, or more precisely group of matches with the same substring, and wrap it in a list at that point. Can anyone recommend a starting point for thinking about defining a relative path with specter like this? I would ideally like to avoid the perf overhead of two separate transforms.

sophiago22:05:24

Oh, and this also ignores the issue of wanting to group together multiple occurrences of the same substring. So the symbol added in the second transform should only ever be one, but the predicate would have to be modified to match on the collection above all other collections that contain that exact substring.

nathanmarz22:05:45

@sophiago that first one can be simplified to (setval [TREE NAME #"p[1-9]"] "p" (macroexpand x))

👍 1
sophiago22:05:46

@nathanmarz ah, thanks. I wrote this very late last night and was thinking I should have been able to do something like that

sophiago22:05:24

I just posted an example of what that transform does with what I hope is a slightly better description of what I'd like to add to it

sophiago22:05:37

(I also omitted mentioning I'm using a fork of the compiler where the result would actually be valid...i.e. nested function literals where % shadow one another if necessary)

nathanmarz22:05:16

not understanding what you're trying to do with that second transform

sophiago22:05:44

Let me provide a more succinct and precise example

sophiago22:05:46

There's one place in the macroexpansion above where the bound variable occurs twice: (*'(long (. Math pow 10 (first p__13958#))) (second p__13958#))

sophiago22:05:43

So, ideally in the same transform, I'd like that to become: (fn* [p__13958#] (*'(long (. Math pow 10 (first p__13958#))) (second p__13958#))

sophiago23:05:41

The cases where the bound variables only appear once are simpler, although I'm still unsure how to handle them in one pass with Specter

nathanmarz23:05:06

so you want to collect every unique symbol starting with "p" and wrap the expression with (fn* [<collected p's>] ...)?

sophiago23:05:56

Yes, but only wrap the deepest subexpression where the symbols are exactly the same. But the numerals will never repeat in the input, so saying exact substrings is good enough.

nathanmarz23:05:50

for wrapping once you identify the subexpression you can do it like:

(transform (collect TREE symbol? (selected? NAME #"p[1-9]"))
  (fn [psyms expr]
    `(fn* [[email protected](set psyms)] ~expr)
    )
  data)

nathanmarz23:05:05

assuming TREE goes to every leaf

sophiago23:05:22

TREE goes to every coll currently

nathanmarz23:05:35

as for "deepest subexpression where the symbols are exactly the same" that sounds like a more involved algorithm

nathanmarz23:05:55

the same "p" symbol could exist deeply nested across multiple branches

sophiago23:05:02

Right, that's much more involved than simple one list up.

sophiago23:05:01

I can make some assumptions based on what would constitute valid input. For example, how px_foo means foo will always be the same for every x. Similarly, for each x: they'll be grouped together as far as depth in the entire AST (not sure whether that makes sense).

sophiago23:05:27

So both those in the example I pulled out started with p3. There would never be a p3 at a higher level of nesting or below the level where there's a p4. I'm struggling for language to describe that. Like the "x" in "px" is monotonic with AST depth?

nathanmarz23:05:36

you mean the depth of the root of the subexpression containing all instances of pn is monotonic with n?

sophiago23:05:37

So when you mentioned multiple branches, I can actually ignore that. It's only a matter of variance in the number of lists you'd need to jump up for the symbols to be in the same one.

sophiago23:05:27

"depth of the root of the subexpression containing all instances of pn is monotonic with n" => I believe this is exactly what I'm stating

sophiago23:05:57

Just some confusion over your use of the term "root"

nathanmarz23:05:06

the difference between:

[:a [1 [2]] [1 [2]]]

[:a [1 1] [[2] [2]]]

nathanmarz23:05:24

in the first, the root for both is [:a [1 [2]] [1 [2]]]

nathanmarz23:05:43

in the second, the root for "1" is [1 1] and for "2" [[2] [2]]

nathanmarz23:05:49

is that what you're getting at?

sophiago23:05:05

Yes. You can assume it's like your second example.

sophiago23:05:16

To backtrack a bit (no pun intended) in (setval [TREE NAME #"p[1-9]"] "p" (macroexpand x)) will "p" be made into a symbol or do I need to add that?

sophiago23:05:41

I hate treating symbols as strings, but seems unavoidable here 😕

sophiago23:05:05

To be clear, the transformation at that level is p3__13962# => p__13962#

nathanmarz23:05:56

it replaces the substring within the symbol

nathanmarz23:05:12

behind the scenes NAME is extracting the string for the symbol's name, manipulating it, and then reconstructing the symbol

sophiago23:05:58

Oh, that's awesome. Such a common pain point.

nathanmarz23:05:50

as for the subexpression identification part, that seems more involved

sophiago23:05:56

Also, in your second example, shouldn't [[email protected](set psyms)] be either [email protected](into [] (set psyms)) or [email protected](into [] (distinct psyms))?

sophiago23:05:30

into or vec, I'd have to look

nathanmarz23:05:45

[email protected] within [] uniques and puts all the symbols into a vector

sophiago23:05:15

Oh, I assumed you'd end up with a set inside a vector

nathanmarz23:05:28

that would be if you did ~psyms

sophiago23:05:14

I don't want to belabor that point, but you're saying psyms would evaluate to a set? My assumption was it would be like: [(set ["foo" "foo"])] => [#{"foo"}]

sophiago23:05:21

Also, I think depending on how your navigator in the second example works I may just be able to just apply another transform on TREE to expr to replace the values and that's all I need.

nathanmarz23:05:55

oops meant to write ~(set psyms)

sophiago23:05:23

Oh, okay. That makes more sense

sophiago23:05:04

I'm thinking through what a navigator for even this example would need to look like. It is a bit more involved than I initially assumed

sophiago23:05:24

For it to work top-down in one traversal, it would have to backtrack after it either reaches the next x in depth or the bottom of the tree. That's the only way to make sure it adds in each symbol in a fn* [..] at the correct level.

nathanmarz23:05:00

I've done similar things with dags before

sophiago23:05:29

This all comes down to me not understanding Specter well enough to hack it out at this level of complexity

nathanmarz23:05:57

in that algorithm each node has an id, and I look at the list of node ids to the root from everything I'm trying to find the lowest common root for

sophiago23:05:58

I can't tell whether that means it would work for my purposes as stands or not

nathanmarz23:05:43

you could do it with multiple passes by annotating a generated id metadata to all subexpressions, then find the path of ids to each instance of "p" symbols

nathanmarz23:05:04

then with the paths its easy to identify which subexpressions to target in another pass

sophiago23:05:26

Oh, the problem with your example is when to stop recursing on expr because if I replace all the way down I'll screw up the next match

sophiago23:05:19

It does seem like multiple passes might be a good start. Then I can refactor to combine them. It's much easier if I can just have the navigator stop at the next px it sees.

sophiago23:05:31

Another way to break it down would be to apply the second transform before the first and use just two passes. Then it can add in the bindings at common roots and eliminate the numerals in symbols in the next pass.

sophiago23:05:49

That seems to make the most sense from how I'm grokking it now