Fork me on GitHub
jumar03:04:46 > tests did not show any significant difference between TDD and TLD in terms of testing effort (p-value = .27), external code quality (p-value = .82), and developers' productivity (p-value = .83).


And that's not a rare result: > This is our sixth post about TDD, and the sixth time the conclusion has been that it doesn't make a difference.


> We replicated the baseline experiment, using a crossover design, with 21 graduate students. While I've never been known to promote TDD, I probably wouldn't use this study to try and convince anyone otherwise.

☝️ 1
😆 1
Noah Bogart03:04:00

Hah i was about to post that quote. It stood out to me as well


from the > The baseline experiment utilized a single experimental object: the Bowling Scorekeeper (BSK) kata. The task required to participants to implement an API for calculating the score of a player in a bowling game. The development of a GUI was not required. The task was divided into 13 user stories of incremental difficulty, each building on the results of the previous one. An example, in terms of input and expected output, accompanied the description of each user story. An Eclipse project, containing a stub of the expected API signature (51 Java SLOC); also an example JUnit test (9 Java SLOC) was provided together with the task description. The task was administered electronically at the beginning of the experiment sessions.

Cora (she/her)03:04:20

what would be a well-designed experiment for this?

Cora (she/her)03:04:40

do you need to already be an advanced programmer for TDD to benefit you? or are you referring to the small sample size?


I think this is an interesting study, but I wouldn't claim that it shows that TDD doesn't make a difference.

Cora (she/her)04:04:28

it's definitely provocative, though!

👍 1

The main issues I have with the summary given are: • measuring testing effort, code quality, and productivity with a single p-value • small sample size • I don't think there's anything wrong with the task or experience level, but they do significantly narrow the scope of applicability

Cora (she/her)04:04:41

I think TDD should prove it's effective, too

Cora (she/her)04:04:04

I mean, it's what makes the claim after all


Yea, I'm actually kind of surprised that TDD didn't have a positive effect.


Note there are many other studies, as mentioned in the second link:

Cora (she/her)04:04:57

I'm not particularly surprised

Cora (she/her)04:04:40

it's probably like a lot of things in programming where different kinds of problems are suited to tdd depending on the developer

☝️ 3

@U06BE1L6T are there links to the other studies? I'm having trouble finding them

Cora (she/her)04:04:19

people moralize tdd a lot, which always gets my spidey-sense going


@U7RJTCH6J I'm not sure, you probably need to go through

Cora (she/her)04:04:26

if it's so effective it shouldn't really need zealotry for it to survive, right?

Cora (she/her)04:04:43

there are a lot of things in programming like that, I suppose


I have a strong distrust of meta-analyses. In my experience, they're often a way for someone to support whatever conclusion they want. It's not that meta-analyses can't be useful, but they don't have a good track record.

Cora (she/her)04:04:08

the important part is that they prove what I already believe troll

😎 1

There is a Test Driven Development category


> This is our sixth post about TDD, and the sixth time the conclusion has been that it doesn't make a difference. This seems not true. It does look like it's the sixth post, but results are mixed. I'm not sure any of the studies make a case that TDD doesn't make a difference, but there's at least one study that claimed positive results for TDD: > Case studies were conducted with three development teams at Microsoft and one at IBM that have adopted TDD. The results of the case studies indicate that the pre-release defect density of the four products decreased between 40% and 90% relative to similar projects that did not use the TDD practice. Subjectively, the teams experienced a 15–35% increase in initial development time after adopting TDD. I think the studies are interesting, but the summaries from this site do seem to have strong anti-TDD bias. I think the methodology for is interesting.

👍 1

Now try the same with Event Modeling 👀


It's important to note that "But studies like Nagappan et al's show that TDD is likely to be beneficial." -- despite all the other negative pull-quotes.


There is a lot of material out there that is very anti-TDD but those making that argument are very selective about the material they choose and the material they ignore.


I might be more sympathetic if nearly all the reviews weren't by Greg Wilson:

👍 1

Doesn't TDD (or tests in general) pay back in increasing amounts as the codebase gets larger?


> tests did not show any significant difference between TDD and TLD in terms of testing effort (p-value = .27) > p-value = .27 Eh? Haven't read the rest, and probably won't given that usually anything that high is called insignificant difference that could've been caused by random chance. I just realized that I'm still waking up and read that completely wrong, sorry for the noise. And on the surface it seems to be very similar to static vs dynamic typing studies where the proponents of each find studies supporting their preference, lots of studies make lots of mistakes and have poor design, and in the end the conclusion is still "who knows".


This may be one of those cases where having large volumes of grad students is actually a plus, because TDD is supposed to help more the less experience you have (or so I believe) :)


I think Test-as-you-go is great for theory-building about programs by forcing us to slow down & pay attention to detail... I think this is the key value of a testing practice. Now, I'm terribly interested in testing, and test my stuff as much as humanly possible. However I'm not very convinced about the long-term value of "unit" test suites that TDD (or other developer-driven test writing practices) tend to output. • Partly because the definition is wooly and nobody reeeealy agrees on what a "unit" is (much less • Partly because unit tests are prone to decaying fast in information theoretic value. The 1,000th consecutively successful test run conveys zero useful information from an info theory point of view. (Likewise the 1,000th consecutively failing test run). Note that the "ah, yes" moment of a TDD practice is when the test passes. That is the point of 100% useful information, and then it's downhill from there. • Partly because unit test suites do not predict arguably vital metrics like MTTR or CFR. So it's useful to wonder where to allocate finite programming brain cycles. • Partly because when problems happen, good debuggers > unit tests for post-hoc analysis. Because now you have to ignore the whole test suite to find the problem. • And partly because a category error: Tests != quality, but best practices exhort us to think otherwise.


There is strong correlation between introducing a TDD practice and reduction in defect rates. But it's unclear to me how much of the benefit is due to the tests produced, and how much it is due to increase in communication, attention to detail, and general culture improvement (deliberately fail more to find flaws in thinking v/s pass the buck to the other department).


(And I'm not sure how any of these account for observer/anchoring bias, because as a culture programmers "know" that TDD is supposed to be "better". So if you suddenly get management carte blanche to do what you forever wanted to do, and you know your "performance" may be indirectly rewarded by better optics/promotion etc... How does that factor into the code you write, because now you're actually interested in your stuff, and if not that, in making a great impression / "winning"?)


All of this is also subject to the definition of "productivity". What is it even? My old man who types with his index fingers filed five patents in the last two years, for some pretty advanced process engineering and material science work. That sounds like productivity to me.


And lastly, the devil really is in the details, right? It matters how the test code is written. Left to our own devices we will make cathedrals of test utilities and mocks and who-knows-what, and then who's going to test the test suite?


None of this is to say "TDD bad". I'd prefer a "yes, and" version of the world, viz. "Yes, TDD, and ... X Y Z" (which it so happens, forces us to make better tradeoffs about where to allocate our severely limited time and attention).

Cora (she/her)12:04:28

wow, adi! that summed up so many of my thoughts and feelings on TDD that I've struggled to put into words!!

🥲 1
👏 1
Noah Bogart12:04:56

Earlier this year I read the original 2004 Kent Beck book “Test Driven Development”, and I was quite surprised by how different his depiction of TDD was from the modern conception: he focuses solely on observable behavior (and makes comments about not testing internal implementation), he's not super strict about “write only the minimal code to make the test pass”, his refactor step touches all sorts of code, and he is deeply playful with the whole thing. I highly recommend the book, it was a treat to read.

metal 1

> he is deeply playful with the whole thing Hitting like on this so hard. This is precisely what rote TDD demolishes.

Cora (she/her)12:04:54

a question to think about: does the repl replace a lot of what TDD is good for? as in, was TDD developed to get immediate feedback on code you're writing because most languages lacked any other mechanism to get fast feedback? I know plenty of people that write unit tests to build code and then throw away most of the low-level unit tests once they're done. isn't that a bit like how ephemeral repls are?

😁 1
🍿 1

regarding clojure, I've suspected that either: 1. clojure's repl is "good enough" so tests have less immediate value so lazy devs skip them or 2. TDD is overrated and a repl is all you need


the problem being, the next person to touch the code doesn't have your repl history


also there are a lot of extremely common clojure bugs that work in the repl but not outside it (laziness and clogging core.async threads being the main culprits)

Cora (she/her)14:04:20

there's no such thing as lazy, imo, but I agree on lack of immediate value! I wouldn't suggest not having higher level tests that test functionality (even using generative testing if you're feeling frisky)

Cora (she/her)14:04:41

a huge test suite is its own liability, too. it's trade-offs all the way down 🐢


instead of lazy let's say "having priorities elsewhere", and lacking the intermediate incentives that make writing a test the low effort path

Cora (she/her)14:04:24

I'm 100% on board with that

Noah Bogart14:04:22

this is one of the reasons I'm hopeful about #hyperfiddle’s library rcf: writing repl-friendly code that will run like tests but is elided when compiling for production lines up really nicely with my own variation on repl-driven development

Noah Bogart14:04:04

i already write "tests" in my comment forms that i end up using as the basis for actual deftests, so this will help skip the copy-pasting step i have to do

Cora (she/her)14:04:32

I mostly enjoy breaking orthodoxy via flexible, adaptive thinking, and for a not-insignificant segment of programmers TDD is as inflexible and orthodox as it comes

😂 1

Wow, I didn't expect such a good discussion, thanks! :)

💜 4
respatialized15:04:22 I think the lessons Fred Hebert is drawing from patient safety and anaesthesiology are quite relevant to discussion of studies like this. I have my doubts about how much you can conclude about programming "in the large" by reducing programming to super discrete tasks that can be randomly assigned to a control or treatment group.


> The earliest work conducted in the 1950s (e.g., Beecher) used a traditional epidemiological approach, and got nowhere. (Other early efforts outside of anesthesia similarly foundered.) Progress came only after a fundamental and unremarked shift in the investigative approach, one focusing on the specific circumstances surrounding an accident—the “messy details” that the heavy siege guns of the epidemiological approach averaged out or bounded out. These “messy details,” rather than being treated as an irrelevant nuisance, became instead the focus of investigation for Cooper and colleagues and led to progress on safety.

Drew Verlee01:04:12

Having a different perspective can help because often times when we first attempt to understand something, were overwhelmed by details. TDD can make that first pass happen in code external to the system. It means a developer will have to at least make two passes at the idea. And the first will be more about imaging it's edges, more straight forward examples, easier to grasp then the design of the system. This is why TDD is useful, but that journey doesn't need to be reified in code that lasts. It's likely that the journey I take to reach an understanding isn't how I should explain it to others. I should hope to learn create a better path that I took. And while there are times when in the construction of something, I prop up tests, and yea, sometimes they stick around. But more often the end product ends up being so simple that it doesn't need any tests. It's usually just transferring data between two well tested abstractions/lib. Libraries which I'm having a hard time imagining how you would have designed through tests. Like saying, we want to get to mars. So first we need to have a button on mars to press when you get there.


We have about 99k lines of production Clojure and about 26k lines of tests. Not all of our code is test-driven but pretty much anything around our REST API is, to help validate the design of the API, the arguments, and the responses. Those tests always stay around. Now, where we're exploring a design (or contemplating possible implementations), we may well do that in RCFs via the REPL and that code may or may not turn into long-lived tests -- so I'll say we'll rely on TDD in some places and RDD in others and all of the former stays around but only some of the latter. And the resulting automated test suite helps us catch regressions when we make changes -- because not all changes are as localized as we think they are when we TDD (or RDD) them in isolation (at "unit" level for whatever that unit is).

👍 1

+1 Well-aimed tests are worth their weight in gold. Ones that target contract boundaries and interfaces between different things tend to be surprisingly useful.


I have an extensive test suite on Chlorine, Clover, and most of my plug-ins. What I see is that multiple people try to test at the wrong "level" (like, pure unit tests where the whole world is mocked) and then complain that it doesn't work. In my projects, all the complex logic is on REPL interactions so I have a macro with-repl-connection that just connects a socket REPL, and then my tests can assume that the REPL is there. No mocking, no stubbing, just pure "eval this and I want this result". Now, I changed multiple times the way I interact with REPLs. And I was able to change the code with 95% of confidence that things work (the last 5% was me being afraid that I didn't write enough tests 😄).


I know that there are people that don't like to "test first", but honestly, I saw way too many tests that do not test what they supposed to to back this claim. In one job, I deleted almost a whole test file because the person wrote the code then a test to validate it. Turns out, the test was wrong - it was validating, for example, that "unauthenticated users can't access the page", not that "if the user have this specific characteristic, it can't access". The person forgot to comment his implementation and see if the test was going to fail if his code was not there


A throwback to the Mongo discussion - at least they have understood the limitations of using JSON as a storage mechanism and have created an alternative: - that's not bad actually, you can work with this data everywhere, and you don't have to recreate the wheel.


Has anyone here ever made a dynamic module for emacs?


I looked into it a while ago. Wanted to add fennel scripting


bb emacs integration when? 🤪


Does anyone know why in the "double submit cookie" pattern against CSRF, the double submitted cookie needs to be separate from the session id? Does it work if I just double submit the session id? (Assume the session id is cryptographically strong)


The question is answered here: In short, it allows the session id cookie being set as HTTP-only.


Just in case - note that double cookies are recommended only for stateless apps.


@U2FRKM4TW Actually, I never understood why people say that. All the online resources seem to be hand-wavy about why Custom Header or Double submit cookies are "secondary" to CSRF tokens, citing that it's about "defense-in-depth", whatever that means.


Here are some facts I can prove: 1) All anti-CSRF measures are susceptible to XSS. 2) Without XSS, there's no way CSRF would work even with just Custom Header or Double Submit Cookies.


I'm not a security expert, but consider this scenario: • You log into your bank account via the bank's website • You navigate to a completely different website • You press a button on that website that maliciously redirects you to There's no XSS there. The redirect request will have the right cookies. But a redirect can't have the right CSRF token. I'm not sure how this part works with double cookie though - maybe there are specific options that prevent some cookies from being sent with redirected requests.


@U2FRKM4TW But with Double Submit Cookies implemented, this link would not work. my-bank.example will see that the request lacks the matching double-submit token.


What you are describing is the basic scenario of CSRF, or am I missing something?


Exactly - my point is that that scenario does not require XSS to work, and is prevent by a CSRF token.


Ah, and my understanding of "double submit cookies" was wrong. But that doesn't affect my example.


@U2FRKM4TW But that does not counter my 1) and 2). 2) says that with just one of double-submit cookies or custom header implemented, CSRF requires XSS.


This means that there's no basis for the claim: "CSRF token is better than Double Submit Cookies or Custom Headers"


*In any website, stateless or not.*


I see what you mean now. I also see that at this point I'm probably out of my depth and am confusing myself more than elaborating anything here. :) If you don't get an answer here, perhaps a question on HN or a relevant StackExchange website would get some answers.

❤️ 1

And just to clarify my initial response - I was simply telegraphing what OWASP was telling. I don't know myself why that recommendation is there.


No hard feelings. I'm just frustrated because exactly that this recommendation is repeated by everyone without any convincing explanation.


I don't think "security-by-depth" is a good mentality in IT security to begin with. I believe security is a yes or no question, not a hard/easy question.


The recommendation in the OWASP cheatsheet has been introduced by, so I guess it would make sense to ask there as well - at least you'd get your answer directly from the source.

❤️ 1

@U2FRKM4TW Jim confirmed that Double Submit Cookies can indeed be used on stateful apps. I guess it's their wording made the impression that they are recommending one for stateful apps and the other for the stateless apps.


Yeah, I've seen the comment. Weird...