Fork me on GitHub
#beginners
<
2022-06-19
>
Jim Strieter06:06:02

I have a design question. I am working on a project in which 99% of functions must be called directly or indirectly from an event loop. Event loop looks like this:

(defn event-loop
  [s]
  (loop [s' s]
    (if (:stop-condition s')
      s'
      (recur
        (f1 (f2 (f3 ... (fn s))))))))
This design is motivated by a quote by Alan Perlis, "It is better to have 100 functions operate on the same data structure than to have 1 function operate on 100 data structures," although I try not to be dogmatic about it. Every function that must go in the event loop requires 1 parameter, s, and returns a modified version of s. Some or all of the fn's can be bound to keywords in s, like this:
(defn event-loop
  [s]
  (loop [s' s]
    (if (:stop-condition s')
      s'
      (recur
        ((:f1 s') ((:f2 s') ((:f3 s') ... ((:fn s')))))))))
which makes it easy to change fn's at runtime. One caveat I have found is that, if s is a lot of nested maps like it is in my project, it is burdensome to remember the ordering of all the keys. This makes unit testing a bit of a pain because you have to manually set a lot of keys to the right value for every unit test. This in turn makes about 70% of time that go into unit tests fixing null errors, getting the input s exactly right, etc. On the positive side, however, it is very explicit what s should be to get each function to work right. Another cost to this approach is that you tend to need a lot of
(assoc-in s [:path :to :whatever] (some-function (get-in s [:path :to :whatever])))
kind of stuff which gets annoying after awhile. I imagine it would be worth some time to get better at using update-in. As another option, is this the sort of thing that would be smart to solve with a macro? As for how literally to take the Knuth quote - Religion and software rarely go well together. It would be dumb to say, "Every function in your entire project must have exactly the same signature," so I try to only apply it insofar as it makes software better. Functions that don't need to be directly called in event loop, for example, don't need to follow this rule. It's really only necessary for 1.) things that are called in event loop, 2.) things that might need read access to disparate fields, and 3.) things that need to allow arbitrary composability at runtime. The application is a platform for a stock trading robot which, from its empirical nature, demands a lot of flexibility for experimentation. (Some of the optimization problems involved require composing functions at runtime in ways that aren't always predictable, hence the "accept s, return s" convention.) Here is my question: What, if anything, is a good practice in all this? @ericdlaspe expressed interest in the design but I said "I'm a noob. Let's check with the community to see if it sucks."

phronmophobic06:06:46

Not totally sure, but this seems a lot like middleware or interceptors. Even if it's not, you may able to borrow some ideas. One technique that might help is for every function to specify which other functions it requires or expects. So if you add f, you can have it automatically include its dependencies, either before or after, in the right order. Middleware: • https://nrepl.org/nrepl/design/middleware.html • Ring middleware (I couldn't find a good resource) Interceptors: • https://github.com/lambda-toolshed/papillonhttps://day8.github.io/re-frame/Interceptors/http://pedestal.io/guides/what-is-an-interceptor

🙌 1
phronmophobic07:06:06

> Another cost to this approach is that you tend to need a lot of >

(assoc-in s [:path :to :whatever] (some-function (get-in s [:path :to :whatever])))
> kind of stuff which gets annoying after awhile. I imagine it would be worth some time to get better at using update-in. As another option, is this the sort of thing that would be smart to solve with a macro? There are lots of libraries aimed at this problem or related problems: • https://github.com/redplanetlabs/specterhttps://github.com/noprompt/meanderhttps://github.com/lilactown/cascadehttps://clojuredocs.org/clojure.zip/zipper • etc. etc.

👍 1
🙌 1
phronmophobic07:06:49

I'm pretty sure perlis got the quote from bruce lee

1
🙌 1
hiredman07:06:51

A deeply nested map is usually a bad idea

hiredman07:06:19

Ideally a function gets what it needs to do what it does

hiredman07:06:08

I would look for some structured way to specify which part of the map a function is interested in and only give it those parts

plexus07:06:03

It sounds like you're keeping your entire system state in a nested map (which is fine) and then have lots of functions that take this entire state map as input. That last part isn't great as you're discovering. You should find that groups of functions deal with the same pieces of data, so you should structure your state accordingly, so you can pull that out and pass only the relevant bits to each function.

phronmophobic07:06:49

> It sounds like you're keeping your entire system state in a nested map (which is fine) and then have lots of functions that take this entire state map as input. Isn't that exactly how ring middleware, nrepl middleware, pedestal interceptors, and re-frame interceptors work?

phronmophobic07:06:26

afaik, neither worry too much about trying to filter out irrelevant bits when passing the state through

plexus08:06:31

ring/nrepl/pedestal deal with request/response, not with entire system state. Re-frame does, but most of the access is through subscriptions which do exactly what I suggest, pulling out specific pieces so you can reason locally instead of globally.

Ben Sless09:06:36

In terms of design, it looks a bit like the pathological end state of functional programming in Out Of The Tar Pit, where you pass a "god object" representing the state of the world between functions, where you effectively reintroduced global state as a parameter Some things to help manage this complexity can be specs (hiredman's suggestion for structured way of specification), grouping relevant functions together like previously suggested, namespaced keywords, and eventually refactor to not require passing the entire state around

1
Eric13:06:58

For accessing those nested data structures, https://github.com/redplanetlabs/specter (as @U7RJTCH6J mentioned) might be almost a de facto standard in Clojure at this point. I see it come up constantly in conversations about reading and writing deeply nested structures. What’s very cool about it is that Specter (usually) chooses the optimal functions for reading and writing to each type of collection, so you don’t have to spend a lot of time thinking about that for every step you drill down. It can also cache the paths it traverses for faster repeat access, or you can explicitly create path variables to clean up your code in places they get used repeatedly.

Jim Strieter05:06:59

@U7RJTCH6J "If the Internet says it, it must be so." - Archimedes

Jim Strieter05:06:39

@U7RJTCH6J "Isn't that exactly how ring middleware, nrepl middleware, pedestal interceptors, and re-frame interceptors work?" I don't know, I'm new to Clojure. But others have made the same comment so I'm definitely gonna read up on those.

Jim Strieter05:06:12

@UK0810AQ2 this definitely looks a bit like the pathological end state. I try to keep it from quite turning into PES by only using the rule on things that need to go into event loop and things that need to be cascaded in arbitrary order. But yeah, at this point I'm definitely open to better ways of doing things.

Fredy Gadotti11:06:30

What if the quote is misinterpreted? "It is better to have 100 functions operate on the same data structure than to have 1 function operate on 100 data structures", looks like it is saying that is better you have a single structure type, not only a single value, instead of have a class Person, another class for Employee and so on. I don't believe keep the entire system into a single structure fits almost all needs.

🙌 1
Fredy Gadotti11:06:53

This https://stackoverflow.com/a/6160116/18092683 thread adds value to the conversation.

plexus12:06:38

Yeah I'm pretty sure that quote is about having a single data abstraction, not about having a single data structure value.

Jim Strieter05:06:09

The way I always read that quote is pretty much what @U033YE56GCV said - the 1 thing is a data type. 1 abstraction. 1 map format. I should have mentioned in my original post that I wanted to be able to assume the existence of certain keys in functions that accept maps. Trying to have exactly 1 value for that data structure throughout the program would be imprudent.

mukundzare14:06:38

Hi guys, I am having some difficulty wrapping my head around symbols and vars. As I understood it, there's a relationship between them as follows symbols -> vars -> value So, as per that logic, when I redefine a symbol which holds a function, it should redefine the value stored in the var of that symbol. However, I saw it in ring handlers that we have to pass the var of the handler function using the dispatch macro #' and not the symbol itself. Here's the confusing part: During REPL development, when the handler's definition changes, then the var that was holding the previous definition of the handler now holds the new value but the symbol still points to the same var. Logically, I see it that the symbol now indirectly points to a new value of the handler. So why is it that the handler always holds the old value in the ring responses when given a symbol but holds a new value when the var is changed? The visual representation of my understanding is as follows: symbol ➡️ var:arrow_right: value-of-handler ↘️ new-value-of-handler So, if there is always a link between the symbol and handler, why use the var at all?

teodorlu17:06:55

What do you mean specifically when you say there's always a link between the symbol and the handler (value)? Symbols are just symbols. A namespace is a map from symbols to vars. When you re-evaluate a def or a defn, you point the var to a new value. Perhaps you're confusing symbols and values?

teodorlu17:06:44

> So, if there is always a link between the symbol and handler, why use the var at all? > Can you provide an example of using a symbol as a link to a value?

mukundzare17:06:15

I am looking at the code here:

(defrecord ParensOfTheDead []
  component/Lifecycle
  (start [this]
    (assoc this :server (start-server #'app 9009)))
  (stop [this]
    (stop-server (:server this))
    (dissoc this :server)))
Look at line with the start-server function which takes another handler called app . The var #'app of the symbol app pointing to the value stored (defroutes....) in it is used instead of simply using app . app is defined so:
(defroutes app
  (GET "/" [] index)
  (resources "/"))
Why can't we use the symbol app and why do we need to use the form #'app if we want to use the redefined value of app during runtime at the REPL? The confusion is coming from my understanding as follows: When you say that symbols are just symbols, aka a type of literal and a namespace is a map from symbols to vars, and consequently vars are pointing to values, so this implies for me that symbols are pointing to values via vars, so it should not matter if I am using app or #'app in the example above because the forms resolve as: 1. When (start-server app 9009)` is used, app resolves as : symbol (app) -> var (#'app) -> (defroutes....) 2. When (start-server #'app 9009) is used, the resolution happens as: var (#'app) -> (defroutes.....)

teodorlu18:06:09

If you use

(start-server #'app 9009)
, on each HTTP request, the HTTP server will use the current value of app. If you instead use
(start-server app 9009)
, you’ll evaluate app when you start the server. So if you redefine app later, that will have no effect. Does that make sense?

teodorlu18:06:17

The difference is when app is resolved. It works either way. But for this case: > 1. When (start-server app 9009)` is used, app resolves as : > symbol (app) -> var (#'app) -> (defroutes....) all resolution happens when start-server is called. Which makes for a bad REPL workflow!

✔️ 1
teodorlu18:06:57

Trying it out for yourself might be better than me explaining 🙂

didibus18:06:59

Values pointed too by symbols and vars get auto-derefed in Clojure if used as an argument to a function

didibus18:06:27

So when you call start-server using the app symbol, the function doesn't receive the symbol and it doesn't receive the Var either. Instead Clojure first gets the Var for the Symbol, then it gets the value of the Var and it passes the value to the function.

✔️ 2
didibus18:06:25

You can tell Clojure not to do this by either quoting using ' to not have the Symbol dereferenced or by var-quoting #'to not have the Var dereferenced. In that case the function will be passed the symbol or the Var. The Symbol though cannot be used as-is by the function, the function needs the namespace to find the var of a symbol and symbols used as functions return a symbol. Where-as the Var can be used as-is and Vars used as functions deref themselves.

1
didibus18:06:16

So when you call: (start-server app 9009) It does what you said: symbol (app) -> var (#'app) -> defroutes ... So start-server receives the routes. When you call it like: (start-server #'app 9009) it doesn't do what you said. You're telling Clojure to pass the var as-is. So start-server receives the Var (#'app).

1
mukundzare08:06:28

Thanks @U3X7174KS, @U0K064KQV. I get it now. The resolution and the dereferencing chronology were the missing pieces in this puzzle. I still need to let that sink in as it feels a bit unnatural coming from other languages.

teodorlu08:06:18

Yeah, I remember finding this weird too! This problem/topic is not really present in languages that aren't designed to be used with a REPL, and Clojure was my first such experience. In static languages, you just recompile and restart. In Python, you also just restart the whole app. I've come to think of "temporal coupling" as "there's a specific order in which you have to do things". If you use vars (#'app), you can redefine app any time you want. No temporal coupling. Whereas if you don't use vars (app), you must define app before passing it to your http server. Temporal coupling.

mukundzare11:06:13

Interesting way to look at it. It resonates with what I have read while reading about this topic and my understanding of it: when a symbol is defined, it looks in the ns map to check if the symbol points to a var, if yes then it uses the value stored in the var from the time when the symbol was interned and bound. However, when using vars directly, the value stored in it will be taken literally if there is a literal or will be computed as in the case of functions or macros stored in the var. Does that make sense? This led me to become confused too though: When a symbol app does a lookup for the value via a var, and if it finds that the var exists, it simply takes the value from during the time when the symbol got bound, even if the var which the symbol points to has changed. So in the memory, there are two blocks: 1. The value cached by the symbol 2. The value in the var. Value in the cache will be used every time the symbol is called (1) The value in the var will be used every time the var is called (2). Maybe I am overthinking :D

teodorlu11:06:14

Sounds about right to me :)

plexus12:06:33

I think you might be attributing more functionality to symbols than they really have. Symbols are just symbols, they're plain values, there's no such thing as "the value cached by the symbol". The question is what happens when the compiler sees a symbol in your code. When you evaluate code the compiler will treat symbols in one of two ways. Normally it replaces the symbol with the value that the var with that name currently has. However if the symbol is in the first position in a list, then it compiles to a var lookup+function invocation. (Unless you tell the compiler to already deref the var at compile time). So functions used as functions can be redefined, because the compiled code looks up the var every time it is run, but functions passed around as values are only derefed once at compile time, and you need to opt-in to the var lookup with #' if you want to be able to redefine them.

mukundzare15:06:42

Thanks @U07FP7QJ0, this clarifies it even further. I think now I have a proper understanding of this concept. Your videos from lambdaisland are also helping me learn web dev in clojure. I actually was doing parens of the dead series, where I first encountered the var usage, so I got confused and tried to learn about it somewhere else, then I watched the ring part 1 episode which explained the concept with the var usage in the handler (great episodes BTW!, thanks ;)). Nevertheless, I was still left confused about the var usage so I turned to posting here..

didibus16:06:05

Thinking of the evaluation is the best. When something is evaluated, the symbols are resolved to their namespace mapped Vars (or constants), and then the Vars are dereferenced for their value. That's all. A function call is evaluated over and over again normally, each time the function is called, so the function and the arguments will be resolved and their Vars dereferenced everytime, allowing them to pickup the latest bound values each time. But start-server is never called again. Thus it gets evaluated only once. So the running server will continue to have the values from the time it was evaluated. But a user can choose to explicitly pass a Var instead of a Function. And because Vars used as functions have the behavior to deref themselves and call the deref value as if a function, they can be used transparently where functions are used, but to pass a Var instead of what the Var points too, you need to var-quote it with #' So now internally start-server will call the function you passed, but it'll have a Var instead, so each time it will call the Var as a function forcing it to lookup the Kate's value of the Var each time.

didibus16:06:20

So you can think of it as: code that is re-evaluated will pick up everything that was re-defined, but code that isn't re-evaluated will continue to use what was there beforehand. In those cases, you can explicitly write the code so it depends on a reference instead which will deref itself to the re-defined values. One way to do that for functions is to pass an explicit Var instead, for variables you'll need to do something more complicated like use an atom or change the code so it explicitly derefs a Var and is passed a Var

peterh21:06:13

core.match: is it possible to define a pattern and ensures that two local bindings are equal? Something like:

(match ['((:a) :b)]
  [([([p] :seq) q] :seq) :guard #(= p q)] true
  :else false)
Except that this doesn’t work because p and q cannot be resolved in the guard. Having to destruct the pattern in the guard to get to the actual values would defeat the whole purpose of pattern matching…

peterh00:06:16

Well, I just found this and related posts (I think they weren’t in the same thread): https://clojurians.slack.com/archives/C053AK3F9/p1615130133472200 https://clojurians.slack.com/archives/C053AK3F9/p1615131385477000 As far as I can remember, it is possible in OCaml and Haskell to access local bindings from patterns in guards, so it would be great to have this in core.match too. But maybe it would be hard to implement in the current form of the macro?