Fork me on GitHub
#other-languages
<
2021-04-23
>
Yehonathan Sharvit13:04:42

A question regarding the importance of data immutability in node.js server-side. Considering a typical scenario where an nodejs app reads some data from some data sources, apply business logic and return data in JSON.

Yehonathan Sharvit13:04:26

What kind of issues could arise if we don’t use immutable data?

Yehonathan Sharvit13:04:35

I mean: the state is external to the app. So why data immutability is important in that specific case?

borkdude13:04:23

immutability leads to better local reasoning. if you have an immutable thing, you don't have to fear that some other part of the code will modify it from under you

Yehonathan Sharvit13:04:41

I know but it might sound theoretic for nodejs devs. I forgot to mention that the context of this question is a talk about the value of data immutability that I am going to give at a node.js meetup next week

Yehonathan Sharvit13:04:12

It is easier (at least for me) to articulate the value of immutability when the app has an inner state

borkdude13:04:14

Maybe it's good to read the READMEs from several immutable JS libraries like immutable.js

Yehonathan Sharvit13:04:21

E.g. in the frontend

borkdude13:04:24

because they exist for a reason

borkdude13:04:10

ah you mean nodeJS as in backend JS apps, yeah, not sure if immutable JS libs are used a lot there, interesting question

Yehonathan Sharvit13:04:24

I am gonna re-read Immutable.js README. By the way did you know that the Immutable.js was kind of dead?

borkdude13:04:43

no, again? which lib has arisen now

borkdude13:04:41

I don't follow all that BS and hype anymore. Just use CLJS :P

lightsaber 3
borkdude13:04:22

A book written by Fogus about functional programming in JS in 2013 is now probably stale. While the book he wrote earlier in 2010 still runs with Clojure 1.11

borkdude13:04:33

But maybe you could read that book for inspiration as well

Yehonathan Sharvit13:04:02

Something else: I understood recently that all the libs that provide immutable data manipulation on top of native JS objects are efficient only with records but not with associative arrays

Yehonathan Sharvit13:04:42

@ericnormand do you have a take on the relevance of immutable data in nodejs backend side?

borkdude13:04:23

It's interesting why TypeScript became popular while it's not immutable by default, whereas Clojure has the opposite: dynamic typing + immutability

borkdude13:04:39

probably marketing though, TypeScript is pushed by M$FT

ericnormand13:04:37

I think the statelessness of HTTP helps a lot

ericnormand13:04:49

each request is handled largely independently, using its own state

ericnormand13:04:00

it reduces the amount of sharing

ericnormand13:04:09

so even if you use mutable data, it’s very local

ericnormand13:04:24

until, of course, your code grows and it still gets out of hand

ericnormand13:04:52

that’s all very general, though

ericnormand13:04:06

I don’t have experience using Node.js

ericnormand13:04:26

related: i think that’s one of the hidden values of microservices

ericnormand13:04:47

the services don’t share memory

ericnormand13:04:07

they make copies of anything that needs to be shared (by serializing and deserializing)

Yehonathan Sharvit13:04:25

Yeah. My question is not specific to nodejs

Yehonathan Sharvit13:04:33

Feel free to address the broader question

🙌 3
ericnormand13:04:06

the fact that each HTTP request is handled with very little sharing really helps

ericnormand13:04:48

so, for instance, you copy data out of the DB, you mess with it all you want, then send a copy back to the client

ericnormand13:04:03

no other request had access to that copy

ericnormand13:04:30

another thing that helps is that most apps are partitioned by user

ericnormand13:04:33

race conditions are rare because, even with millions of users, they’re all reading and writing to different rows in the database

ericnormand13:04:00

it’s a very rare case where you’ve got two windows open and quickly clicking buttons in both

ericnormand13:04:23

that has more to do with DB concurrency than in-memory data structures

ericnormand13:04:47

but from what I have seen, most web apps do not have concurrent access done right

ericnormand13:04:19

in practice, though, because I’m modifying my documents and you’re modifying your documents, there isn’t much concurrency anyway

Yehonathan Sharvit13:04:53

In a Google docs like scenario, there is concurrency

ericnormand13:04:07

but if I logged in on a few phones and started messing with it, I’d probably find some bugs

ericnormand13:04:23

yes, and in those cases, they are well-built

ericnormand13:04:49

the whole google doc is a concurrent data structure

ericnormand13:04:03

it’s not crud

Yehonathan Sharvit13:04:18

What other concurrent use cases do we have out there

Yehonathan Sharvit13:04:30

less large-scale that Google docs

Yehonathan Sharvit13:04:57

let’s focus on chat rooms

Yehonathan Sharvit13:04:43

One could implement a chat room with websocets. So it’s a good use case for nodejs, I guess

ericnormand13:04:04

you could have the chat log in memory

Yehonathan Sharvit13:04:22

you could or you should?

ericnormand13:04:53

i’m just trying to avoid using a DB in this scenario

ericnormand13:04:02

most apps push their concurrency into the DB

Yehonathan Sharvit13:04:34

That’s why I am looking for a use case where it makes lots of sense of have the state in mem

ericnormand13:04:07

sessions are another one, but they are partitioned by user as well

Yehonathan Sharvit13:04:28

what kind of concurrency issues would we have if we don’t use immutable data in a chat app?

orestis14:04:37

@viebel I can give you a nightmare example of lack of immutability in a Node.js app. We use mongo and mongo has queries represented as data. So you construct a query based on various request parameters and send it off to mongo to execute. There's a bunch of middleware that goes between the original query and the execution, each of which will modify the query.

orestis14:04:40

The problem with mutable data here is that during development, you can't know what's going on. Once you pass the original query off for execution, you can't reuse it to do a second execution.

orestis14:04:25

We have had dozens of subtle bugs where people assumed the query was the original one and tried to extract parameters from it, reuse it, log it -- but instead they were dealing with a mutated one.

orestis14:04:35

In fact, in a relatively big codebase, once you pass in that query to any function, all bets are off. Even if the function says that it will give you a new query back, there's no way to know unless you go in and review every step of the way.

orestis14:04:57

Which, in a nutshell, is a manifestation of the local reasoning that @borkdude mentioned.

orestis14:04:54

(add on top of all this the async nature of JS, and it can be a nightmare to figure who's mutating what)

orestis14:04:50

In the end, to debug such bugs I had to add console.log every step of the way to capture the values of the query in an immutable place (the stdout).

Yehonathan Sharvit14:04:59

Could you elaborate a bit about the bunch of middlewares that modify the query?

borkdude15:04:18

@viebel Imagine if the maps that go through ring middleware were mutable. That would be a nightmare

orestis15:04:08

Say that you get a query that says "give me all the posts". So you have a mongo query that looks naively like {} -> matches all the documents. But then the business logic kicks in and says, "all the posts for this users means all the posts in the teams they are members of". So it adds {channel_id: $in: [x, y, z]}. Then another middleware adds "don't show drafts unless it's your own posts" so it adds {$or: [{status: "published", author_id: foo}]}... and so on.

orestis15:04:56

The way I write it, it sounds manageable, but in reality it's not 🙂

Yehonathan Sharvit15:04:09

I see what you mean.

orestis15:04:48

E.g. in this legacy codebase, we have a function that is named querySchema.validate. You would expect that this will, well, validate the query. But it actually mutates it.

orestis15:04:50

It's nothing that a little discipline can't fix (that's what Uncle Bob would say). But diving into a new codebase without any systemic guarantees... good luck.

Yehonathan Sharvit15:04:03

When data is immutable, you can store in a variable each step of the process and inspect it or replay it as you wish. Libraries like https://github.com/vvvvalvalval/scope-capture cannot work in a mutable environment.

Yehonathan Sharvit15:04:51

@orestis I’d like to claim that there two approaches to embrace immutability in JavaScript: 1. Using a lib like Immutable.js => immutability at the level of the data structures 2. Using a lib like Lodash FP, Ramda or Immer => immutability at the level of the way we manipulate data

Yehonathan Sharvit15:04:19

The problem with approach #1 is that it requires non-native objects

orestis15:04:35

I'm not sure if the typesystem could help you here. Does Typescript have a concept of immutable function arguments?

borkdude15:04:59

I actually don't know TypeScript

emccue15:04:06

yes, it has readonly

orestis17:04:07

Looks like it’s not that strong of a guarantee https://basarat.gitbook.io/typescript/type-system/readonly

orestis17:04:56

And of course as all Typescript, the guarantees go away at runtime. So again the 3rd party library story isn’t covered.

Yehonathan Sharvit18:04:03

@orestis in what sense is the guarantee not that strong?

orestis07:04:03

You can find numerous ways to work around it (based on that article)

Yehonathan Sharvit15:04:43

The proble with approach #2 is that it is hard to enforce + it doesn’t scale well

Yehonathan Sharvit15:04:10

Do you think that approach #2 would have solved the problems you encoutered in your nodejs app?

orestis15:04:35

No, not unless the original developers who put the system together understood the problems of mutabilty 😄

Yehonathan Sharvit15:04:05

I mean if you forbid object filed assignment

Yehonathan Sharvit15:04:15

What could go wrong?

orestis15:04:17

How would you forbid it?

orestis15:04:28

(I'm not familiar so much with those libraries either)

Yehonathan Sharvit15:04:32

Either by convention or with Object.freeze (deep)

orestis15:04:55

Right, so back to discipline 🙂

Yehonathan Sharvit15:04:17

Yeah. But it’s much easier to catch during a PR

Yehonathan Sharvit15:04:38

I imagine one could write a linter that checks that (js-kondo @borkdude?)

orestis15:04:44

My opinion based on what I've seen in this codebase is that if things are possible, people will do it.

💯 3
orestis15:04:12

So any time you have a plain JS object, you cannot know that someone will not mutate it.

orestis15:04:35

Perhaps the current team is disciplined and consistent. What about a 3rd-party library?

Yehonathan Sharvit15:04:43

Unless you call object.freeze

orestis15:04:01

Well they will try to modify it and then it will throw at runtime, right? Marginally better but not ideal.

orestis15:04:40

Using immutable.js actually is a proper API contract. The moment you leave immutable.js land (e.g. to use said 3rd-party library) you know you are entering the danger zone.

orestis15:04:06

Which is the point of having this immutability baked in the language. There's no danger zone 🙂

orestis15:04:30

I need to run, thanks for giving me a soap box to vent my frustrations at this legacy codebase. Fortunately the transition to Clojure is going well 😄

Yehonathan Sharvit15:04:13

One day JavaScript will have immutability at the level of the language

Yehonathan Sharvit15:04:22

Thank you @orestis for sharing your insights

andy.fingerhut15:04:52

Defaults in a language matter. As someone mentioned above, you can be disciplined on a single project, if everyone agrees, to avoid mutability, but as team members change, the project grows, etc. very difficult to enforce over time.

andy.fingerhut15:04:23

I have worked on single-threaded large C code bases with fairly extensive data structures kept in memory between client requests, and it becomes fear-inducing to look at some code that is 5 levels deep in the function call tree, with 10 more levels beneath you, to have any kind of assurance which functions modify what, even in single-threaded code. Reasoning about correctness is very non-local -- you pretty much need to understand the whole code base in order to understand whether a change is correct (or whether the current code is correct)

Yehonathan Sharvit15:04:31

Could you get into more details about why reasoning about a local function correctness is non-local when data is mutable?

Elliot Stern00:04:58

var valid = validate(list);
foo(list);
bar(list);
var valid2 = validate(list);
// valid could be true and valid2 could be false
// it entirely depends on the implementation of foo and bar

Elliot Stern00:04:28

By contrast, if list were immutable, you know that valid2 is true iff valid is true.

Elliot Stern00:04:40

If you want to change list, it also has to be done explicitly, making reasoning about what the code is doing easier.

emccue15:04:09

There are different kinds of locality

emccue15:04:51

multithreading on the jvm means that "changes from under you" produces undefined behavior

emccue15:04:30

but in a single threaded context there are still logical boundaries

emccue15:04:11

const execute_lazy = (query) => { 
   return () => {
      return execute(query);
   };
}

const query_a = { select: '*', from: 'table' }
query_a['where'] = 'field > 0 && field < 100';
const results_a = execute_lazy(query);
console.log(results_a());
query_a['where'] = 'field > 100';
const results_b = execute_lazy(query);
console.log(results_b());

emccue15:04:18

so this would work and produce no bugs

emccue15:04:44

const query_a = { select: '*', from: 'table' }
query_a['where'] = 'field > 0 && field < 100';
const results_a = execute_lazy(query);
query_a['where'] = 'field > 100';
const results_b = execute_lazy(query);

console.log(results_a());
console.log(results_b());

emccue15:04:50

but this would not

emccue15:04:55

anything that "stores" what it is given to refer to later is a potential boundary

emccue15:04:30

either closures or objects or wtvr

emccue15:04:03

and in node you still have concurrent processes, so they can share data

emccue15:04:59

so say you have some piece of mutable data you put in a middleware shared between route handlers

emccue15:04:28

state updates to that can cross the boundary into other "processes" when you await some request or whatever

Yehonathan Sharvit15:04:20

Sounds very interesting @emccue. Unfortunately, I gotta run 😞. Keep writing and I’ll read and respond later

andy.fingerhut16:04:40

"Could you get into more details about why reasoning about a local function correctness is non-local when data is mutable?" Imagine you have some graph data structures with nodes and edges in memory, mutable, and a single-threaded program handling requests and updating that graph data structure. It has a particular schema, and it is big. The code for modifying that graph in memory is not in a single function. You have a call tree of C functions with a single top level entry point, but the full call tree is a decent size tree with up to 10 levels of calls deep. Some of those functions only read things in the graph, but a large fraction of those functions can insert nodes, add edges, or mutate existing nodes or edges. If you have a picture on the board or in your head of exactly which of those hundred or so functions modify exactly what, and under what conditions, you can reason about how a certain change to the code will behave. If you do not have that knowledge in your head, then you are not sure whether a change to one of those functions will violate assumptions in 1, 2, or 7 other functions in those hundred.

andy.fingerhut16:04:38

I mean, with a large enough code base and immutable data, you could potentially also create something where local reasoning breaks down, but it breaks down in different ways. Mutation increases the number of ways you can be wrong.

andy.fingerhut16:04:05

Immutability at the very least lets you answer this question very quickly and easily: "If I call function foo and pass it these parameters, will it mutate those parameters, or anything they reference?" because the answer is always "no".

andy.fingerhut16:04:30

In a program where mutation is common and expected, that question can be extremely difficult to answer correctly.

Stuart17:04:33

Even just a simple example like this breaks my brain

x == a; // true
// a changes here in some multi-thread environment
y == a; // true
x == y; // false...
err...

Stuart17:04:52

There's a nice section in, I think, Joy of CLojure, where the author talks about equality and how you can't really have equality in an environment where you have concurrency and mutation.

Stuart17:04:30

At best all your equality statements need qualifiers, i.e. x and y were equal, where equal means they both have the same value within a specific period of time, but how do you define that period of time? What if your values have some sort of STM, do you have to qualify equality with something like x and y were equal within a certain time period, and we don't care if x or y were in the process of a transaction that would result in a value where they weren't equal?

andy.fingerhut18:04:27

I read an article on some proposed new programming language where they discussed ideas for equality, and proposed that equals on mutable values should be explicitly called something different that could be read "equals now"

andy.fingerhut18:04:05

Yes, it was this paper: https://www.researchgate.net/publication/310823923_The_left_hand_of_equals. They didn't advocate going all immutable in the end for their programming language, but I like the idea of calling something "equals now"

andy.fingerhut18:04:44

Baker's EGAL operation they call, to contrast it, "equals always", which is what equals on immutable values is.

Stuart18:04:40

Yes. I think that makes sense jf any two things are equal at any point in time then they are equal at all points in time.

raspasov23:04:39

Mutability: everybody has a plan until it punches them in the face.

raspasov23:04:12

;Start node CLJS REPL
;clj -Sdeps '{:deps {org.clojure/clojurescript {:mvn/version "RELEASE"}}}' -M -m cljs.repl.node

(defn mutable-danger-101 []
 (let [obj #js{:x 42}]


  (js/setTimeout
   (fn []
    (set! (.-x obj) :boom))
   (rand 1000))

  (js/setTimeout
   (fn []
    (println "What am I?" (.-x obj)))
   (rand 1000))))

(dotimes [i 100]
 (mutable-danger-101))

raspasov23:04:32

This will randomly print either What am I? :boom … or … What am I? 42

raspasov23:04:27

Sorry to come off the high ropes like that, but to me this is the truth: If a person doesn’t understand the problem of the example above, they haven’t tried doing quality UI or backend development. I can only point them to the number of Rich Hickey talks out there; He explains the problems of mutability very well. I think that in order to really see the problem, you must have experienced the pain, and messed up a codebase 1+ time (while you really cared, and wanted to do good work).