Fork me on GitHub
#specter
<
2017-04-24
>
sophiago00:04:30

@nathanmarz thanks for getting back to me so fast! This is the library I'm working with: https://github.com/Sophia-Gold/Madhava-v2. It's still pretty rough in terms of functionality I need to build out, but the core of it is computing a large number of partial derivatives all at once and storing them in an atomic hash-map. Since the number of partials grows exponentially by order and are stored with keys that match them to the corresponding variables at each order this means the maps end up heavily nested. The Github README describes how to use it, but I'll paste a snippet of the function that creates the map (which is quite small) below and describe some operations on it I'm having trouble getting done with core functions.

sophiago00:04:35

I should also mention I did try using data.int-map last night and the increase in speed was barely noticeable, presumably because when generating the map I'm only associng data into it and not calling update. Let me know if you think int-map makes more sense with some of the operations on it I'm thinking of.

sophiago00:04:14

So for starters, the reason why I thought of this project with regards to specter is because I initially tried to implement some functions that would combine two maps by applying a given binary operation to all the values with matching keys (meaning all their keys matching down to the deepest level of nesting) and then either throw out the disjoint key-value pairs or merge them as is. I tried implementing that with postwalk, but found it too messy to match for just the values (all vectors) since it returns everything in the map.

sophiago00:04:42

More realistically for calculating gradients I'd need to use my linear-transform function in this library as the binary function yet instead of between two different maps, to values store in adjacent keys with some type of weighting scheme. I don't have that fully specified yet, so can set it aside, but I think specter could greatly help. I'm thinking something like a zipper except instead of starting from the head, which would be ridiculously slow, I'd want to go straight to the relevant key and then create the zipper right there.

sophiago00:04:47

That's a bit abstract, but a simpler example are the operations I referred to last night. I was trying to see if I could use partial differentiation to encode laws for cellular automata so started playing around with extremely large maps (often verging on too large to print) generated from one very high-dimensional term and going up to many orders like so: (diff [[0 0 0 1 0 0 1 1 0 1 1 1 1 1]] diff-map 7). That's the map I was trying to search inside of using some (to see if the initial term was repeated) and realized it failed searching for any values due to the level of nesting. Similarly, when using (into (sorted-map) ...) how it would sort to everything but the final level of nesting, filter to eliminate >50% of it that consisted of empty vectors, and vals to try and visualize the entire thing with the keys stripped out entirely.

sophiago00:04:06

So that gives me five operations on nested hash-maps I couldn't achieve with core. I'm not sure how involved they'd be to do with specter, but seems like they could be a very good place to start as far as learning the library.

nathanmarz04:04:33

@sophiago partial-diff can be written with specter like:

(let [pidx (peek idx)]
  (select [ALL
           (selected? (keypath pidx) #(-> % zero? not))
           (view
            (fn [expr]
              (-> expr
                  (update 0 * (get expr pidx))
                  (update pidx dec))))]
    p
    ))

nathanmarz04:04:15

maybe a little bit of an improvement

nathanmarz04:04:17

this is a lot of code for me to parse, I can't really help with what are the optimal data structures / algorithms for this purpose

nathanmarz04:04:55

if you show me subproblems you're dealing with, with inputs you have and outputs you want, I can show you how to use specter for it

sophiago04:04:00

Interesting. I'm not sure how much of performance I can wring out of that tiny function, but it helps me to understand specter.

nathanmarz04:04:31

specter's most useful when you want to change part of a data structure

nathanmarz04:04:36

(and leave the rest unchanged)

nathanmarz04:04:57

when you want to combine everything into a new data structure, that's not what specter is for

nathanmarz04:04:09

though it can be helpful for pieces of it

nathanmarz04:04:28

(`traverse` is useful when you want to combine only parts of a data structure)

sophiago04:04:20

Well, I mentioned a few tasks so I'm trying to think where's the best to start. I did have trouble figuring out how to combine two nested maps by applying a function to all values with the same key signature, but you're saying specter isn't ideal for something like that?

nathanmarz04:04:51

it really depends on the problem

sophiago04:04:11

So far everything I've mentioned is an operation on an entire map, either like above, or scrubbing the data in some way or other.

nathanmarz04:04:18

if you just had two random maps that you wanted to combine, probably not

nathanmarz04:04:38

but if you wanted to do something like merge a map in one location in a data structure with another map at some other location, specter will be useful

sophiago04:04:30

As mentioned, even just sorting and filtering deeply nested maps is not quite working out with core. How would you go about that?

sophiago04:04:46

Or searching for a value inside one?

nathanmarz04:04:53

what's a specific example of what you want to do?

nathanmarz04:04:55

input and output

sophiago04:04:48

So, as one example, (some #(= % [[120 3 3 0]]) @diff-map) returns nil even though that value is in the map.

nathanmarz04:04:45

ok, for this specter can be very helpful

nathanmarz04:04:15

you can define a recursive navigator to the "leaves" of the structure (values which are vectors)

nathanmarz04:04:10

(def LEAVES
  (recursive-path [] p
    (if-path map?
      [MAP-VALS p]
      STAY
      )))

nathanmarz04:04:48

then you can do (select-first [LEAVES #(= % [[120 3 3 0]])] @diff-map)

nathanmarz04:04:04

that will return either [[120 3 3 0]] or nil

sophiago04:04:13

Oh wow, great.

nathanmarz04:04:18

if you do select, then you'll get a sequence of every match

nathanmarz04:04:37

that's very performant too

nathanmarz04:04:07

you can continue navigation at each leaf as well

nathanmarz04:04:27

(select [LEAVES ALL ALL odd?] @diff-map) gets you call the odd numbers in those vectors

sophiago04:04:08

So I can also use select like how I want vals to work on a structure like this? To strip out keys?

nathanmarz04:04:25

don't think of it as "stripping"

sophiago04:04:34

(select [LEAVES ALL] @diff-map)?

nathanmarz04:04:35

it's just querying the data structure for a sequence of matches

nathanmarz04:04:03

but yea, that will get you all the vectors of numbers as a single sequence

sophiago04:04:51

Oh, as a sequence? I thought a big feature was returning the same kind of data structure. What if I want to perform operations on all the values and return a map with the same nesting? For example filtering?

nathanmarz04:04:03

use transform for that

nathanmarz04:04:35

(transform [LEAVES ALL ALL even?] inc @diff-map)

nathanmarz04:04:47

that gives you a new map with all the even numbers incremented

sophiago04:04:03

Ah, this is quite easy once you wrote that LEAVES navigator 🙂

nathanmarz04:04:15

(setval [LEAVES ALL AFTER-ELEM] 10) appends the value 10 to each vector of numbers

sophiago04:04:13

For just filtering would you recommend calling transform with identity as the function?

nathanmarz04:04:33

what's an example of what you want to filter?

nathanmarz04:04:51

most likely it will be something along the lines of (setval [...] NONE @diff-map)

nathanmarz04:04:23

(setval [LEAVES ALL (selected? ALL even?)] NONE @diff-map) will remove any vector of nums that has an even number within

sophiago04:04:28

I was thinking something like (transform [LEAVES ALL ALL #(not= [])] identity @diff-map)

sophiago04:04:37

err wait...one less ALL

nathanmarz04:04:48

what you wrote will be a no-op

nathanmarz04:04:00

you want to remove empty vectors?

sophiago04:04:05

Yes. I'm more interested in working with the vectors as a whole than values inside of them. So that's one example. I can see wanting to apply a function to a whole vector like you did about with inc as well, though.

nathanmarz04:04:15

(setval [LEAVES ALL #(= [] %)] NONE @diff-map) will remove empty vectors

nathanmarz04:04:44

(transform [LEAVES ALL] some-custom-fn @diff-map) runs an arbitrary function on each vec of nums

nathanmarz04:04:37

the path in transform says what you want to change in the data structure, meaning anything else will remain unchanged

nathanmarz04:04:55

so what you wrote says to transform each non empty vector with identity

sophiago04:04:59

This is really great! I don't quite understand how you defined the LEAVES navigator to begin with, but from there everything is incredibly simple.

nathanmarz04:04:13

it's pretty simple

nathanmarz04:04:26

it's just saying how to get to the vec of vecs

nathanmarz04:04:51

recursive-path lets you have the path refer to itself (using p in this case)

nathanmarz04:04:32

it's saying "if currently at a map, recurse at all the map vals, otherwise already at the target so stay navigated"

sophiago04:04:48

Right. I get it now.

sophiago04:04:22

What about merging two maps with a transformation function?

sophiago04:04:26

Let's say I have two of that map above, so the keys are equal, and I want a new map with the same structure and the values the result of a binary function applied to the old one?

sophiago04:04:55

I don't want to make it more complicated by adding a long function to multiply or divide two vectors, but you get the idea. I would also ideally like to be able to choose what to do with key signatures that don't match, but that seems more complicated.

nathanmarz04:04:08

ah, yea specter won't help with that

nathanmarz04:04:18

that sounds like a somewhat complicated merge-with

sophiago04:04:52

I can't quite remember, but that must have been the first thing I tried for that before postwalk. It works with nesting?

nathanmarz04:04:23

your call to merge-with will have to be recursive

sophiago04:04:48

Right. I think what happened was I gave up on writing a recursive merge-with 😛

nathanmarz04:04:51

i've thought a bit about extending specter to walk multiple structures at once, but I haven't personally had many use cases for it

nathanmarz04:04:25

it would likely be a massive change to how specter works and I'm not sure it can be done while maintaining near-optimal efficiency

sophiago04:04:26

It seems like it would help with that as is... I'd have to think about it a bit.

sophiago04:04:47

Yeah, I'm not taking into account efficiency of using specter for something like that.

nathanmarz05:04:12

those transformations I was showing before will be near-optimal efficiency

sophiago05:04:22

I do think being able to precisely define a path and apply transform to that will really help me with this library.

nathanmarz05:04:34

will blow away the performance of postwalk

nathanmarz05:04:43

yea, I think this should be a good start for you

sophiago05:04:04

postwalk just is really difficult to work with due to everything it returns

sophiago05:04:39

I guess the only other thing I was going to ask about was sorting and why (into (sorted-map) @diff-map) wasn't changing anything. This one came out perfectly sorted, but when they get really gigantic and hairy, they're not on the deepest level. I designed it that way in case I wanted to use something similar to pmap.

nathanmarz05:04:24

that will only sort the first level

sophiago05:04:36

Ah ok. Then that's the expected behavior.

nathanmarz05:04:12

you can define a navigator to every map and then do (transform MAP-NODES #(into (sorted-map) %) @diff-map)

nathanmarz05:04:49

(def MAP-NODES
  (recursive-path [] p
    (if-path map?
      (continue-then-stay MAP-VALS p))
    ))

sophiago05:04:11

Yes, I was just going to say. It helps to see the navigator for that as well.

nathanmarz05:04:11

anyway, it's bedtime for me, I'll be back online tomorrow if you have more questions

sophiago05:04:58

Thanks so much! This will definitely get me started. I was also just going to ask whether you find much of a speed increase using int-map with these kind of functions?

nathanmarz05:04:08

I've never used int-map so I'm not familiar with which operations it achieves speedups

nathanmarz05:04:45

I'm guessing it's faster for get and assoc, which these transformations we've been doing aren't using

sophiago05:04:55

I actually found very little increase for assoc. I think the main benefit is when you're calling update frequently since it provides its own version of that.