Fork me on GitHub

@noisesmith @theeternalpulse @didibus Perhaps this would be a better place to discuss philosophy of testing and test coverage tools than #beginners ?


hehe, didn't want to go that far into the weeds but this should help others


I'll be honest, I didn't realize this channel existed -- and I'm very interested in testing (I maintain Expectations and I'm a big TDD fan), I just don't think it should happen in #beginners 🙂


I like expectations, from my experience in this latest thing I did was write write the example usage, then as I break it up into functons have a #_(whatever this is called) example underneat the skeleton of the individual bits with my expectation. Then paste it into a test buffer after and implement them


I use Atom/ProtoREPL so I can evaluate any form my cursor is in (closest enclosing or entire top-level form). I can also run any named test individually with a hot key (or run all tests in the current ns, or even all tests, again with hot keys).


So I write tests directly in files and eval them into the REPL without switching to another panel. Same with source code I write.


right, same with cider/emacs on my end.


I'm still figuring out my flow. Trying to ramp up on my clojure experience, javascript is demoralizing


Didn't try expectation yet


Any good study on the benefits of TDD out there?


All I heard of was an internal Microsoft multi-year report that they conclude TDD was a waste of time, and projects who used it turned out to have just as many defects, and just costed more to produce.


But, I heard this from word of mouth of an ex-Microsoft employee, so I can't corroborate


It was pitched against other projects, so its unclear also what it was compared to directly. I'd assume more moderate usage of tests.


Granted I've also never really come around any analyses of the benefits of tests. Same for types, but for types, I found 4 studies which indicate they're mostly useless.


To put some context and not be unjust to types (as I still love them). 2 studies showed the impact of a programming language over another, with javascript on one hand, and haskell on the other, found that they had less then 5% impact on the defect rate.


And its unclear if it can be attributed to types, or the quality of programmers, since the scale went somewhat like: javascrip -> ruby -> python -> c++ -> java -> c# -> f# -> scala -> ocaml -> clojure -> haskell


Which you can classify as less types to more types (with Clojure as an outlier).


Or you can classify as more beginners to more experts.


Or as more imperative to more functional ( and within imperative as less types to more types).


And another 2 studies focused on productivity


Where I think dynamic programming languages showed an average 40% increase in productivity, while having an equal defect rate. Yet, people who used the typed language all said they felt the types helped them be more productive. So the interesting thing was the psychological effect of the guards the types were giving people, it as like reassuring, yet in numbers, it was slower and did not reduce defects.


Would love to see similar things about tests, TDD, etc.


There also was a study by IBM, can't find it.


Don't have time atm


"Through the introduction of [TDD] the relatively inexperienced team realized about 50% reduction in FVT defect density when compared with an experienced team who used an ad-hoc testing approach for a similar product." /cc @didibus


@didibus I haven’t read through it, but just popped up on a mailing list I’m part of. From the abstract: > Case studies were conducted with three development teams at Microsoft and one at IBM that have adopted TDD. The results of the case studies indicate that the pre-release defect density of the four products decreased between 40% and 90% relative to similar projects that did not use the TDD practice. Subjectively, the teams experienced a 15–35% increase in initial development time after adopting TDD.


Yeah, I'm pretty suspicious of a supposed MS internal study that spent three years to conclude TDD was a waste of time, based on everything I've read to the contrary 🙂


Well, it was a meta study, the "waste of time" was in revenue


Not in actual defect rate


The person I was talking too was saying its because of the type of defects associated with TDD. Are they the expenssive one, or a bunch of cheap one that get quickly fixed a few weeks after release.


What neither the static vs dynamic types studies and the TDD one I heard off looked at, was the long term maintenance cost. Personally, I've rarely had unit tests or types catch bugs that my end-to-end, integ, QA, or just REPL won't also catch before going to Prod. Sometimes they do, but its very rare. But where I feel there is value add, is for long term maintenance. Adding a feature to a code base that does not have a lot of unit tests, or missing static types does feel (no real data, just my feeling) like a much more challenging undertaking, and something that can easily start to sneak in data corruption bugs and feature regressions.


Thanks for all the studies, I'll give them a look.


Interesting how the IBM study defines TDD: > With TDD, all major public classes of the system have a corresponding unit test class to test the public interface, that is, the contract of that class [8] with other classes (e.g. parameters to method, semantics of method, pre- and post-conditions to method)


I think that IBM study is pretty great to at least demonstrate the value of having: Agile integration, Automated test on builds, and a reasonable unit test coverage, especially around features and bug regression.


But it defines TDD as what I just consider standard test practices. That is, integrate early, have automated tests run continuously, and unit/integ tests most APIs.


Am I the only one who defines TDD as the practice of writing failing tests first, and the function afterwards.


All of the time


Ok, the other study is better: > With this practice, a software engineer cycles minute-by-minute between writing failing unit tests and writing implementation code to pass those tests.


Just to not look too devil advocate with everyone else. This I agree with 100% and everyone should work to build such test assets: > Additionally, since an important aspect of TDD is the creation of test assets—unit, functional, and integration tests.


I'm thinking at the micro-level here mostly. Like what level of testing is the perfect amount to meet the ideal balance of productivity/defect.


My feeling is with Clojure, unit tests can be lowered, because the REPL driven development appears to me as giving similar benefits to unit tests, but faster. Having one trivial happy case and a few happy/not-happy corner cases on public fns to help document and prevent feature regression in the future is probably still important. The time saved from writing less unit tests could go to writing more Functional and Integration tests instead. Granted, the REPL does also often cover part of those, so they might also not need to be as complete as in non REPL driven languages. I'm not sure of this, have very little data, but I'm curious about it.


My second feeling is that test-first isn't useful. It doesn't hurt, but I think that part of TDD isn't actually what is driving most of its benefits. That comes more from just the overall emphasis on testing your code, and creating test assets and automating the development pipeline.


My biggest gripe with test first, is that very rarely do I know what needs to be asserted when I start coding. I'm often exploring a problem space, playing with different code organisations, levels of granularity, and then I toy with the function, and discover its proper behavior. So when I do test first, I lose a lot of time re-writing my tests over and over to adapt to my new learnings and discoveries. So I prefer to add tests only once I"m done that process.


so is this a top-down vs. bottom up thing, where you prefer to work bottom-up?


"My biggest gripe with test first, is that very rarely do I know what needs to be asserted when I start coding" -- see, I find that very strange. After all, you normally start with a problem and so that is what should be asserted (or, in BDD style, what should be expected).


For a certain number of data points in your problem space, you should always know what the solution must produce -- the coding work is figuring out how to produce that -- so you can certainly (expect solution-1 (solver input-1)) in some vague form.


Or (expect predicate-2 (solver input-2)) ... so you can write several of those, representing known aspects of the problem space (and expected outcomes). And then start to sketch out solver itself. And you can decompose solver into subproblems, again with certain known expectations of behavior.


@noisesmith Ya, you can put it that way I guess. For big projects, I'll have a high-level design first, but in the low and medium levels, I prefer bottom up.


@seancorfield > After all, you normally start with a problem In the real world, for business problems, this has never been true for me. Rarely is there a clearly formalized problem to solve. Most customer don't know what they want, or how they want it. Also, at the BDD level I can see that sometimes being more true, but at the unit level?


@didibus I've been writing software for about 35 years. There has always been a problem statement that I'm trying to write a solution to. Therefore there is always expected behavior for the software I write.


I've worked across a broad range of industries, both in Europe and America. I don't know how you can even operate as a software developer if you don't start with a problem statement 😜


Interesting, what field to you work in?>


Well, where do you get your problem statement from?


I've worked in insurance, telecoms, software tooling (QA tools, compilers), e-commerce, data organization, online dating...


Normally, I define it myself, from my learning of the domain space, and my playing around with possible improvements


I've never managed to get the business to offer a defined problem


When they do, its so vague, I can't call it a spec in any way.


You have way more experience then me though, I only worked 2 jobs


As an example, say I get that we have to provide an export and import feature. It rarely goes into any more specific. So the edge cases are left up to my team to find and solve and choose which one are worth dealing with or not and how.


(sorry, got distracted by a production release at work)


@didibus Regardless of who writes the spec, you start with the specification of a problem, even if you write it yourself -- and that specification can always be expressed as a series of tests at various levels. That's pretty much "by definition".


Some specifications can be both "nearly English" and also "executable" if you're a fan of Cucumber, for example


(I personally don't like Gherkin / Cucumber but the underlying Given/When/Then approach is a good starting point for figuring out what your tests should cover at a high level)


Given you have an empty file, When you import it, Then the system should be unchanged ... or ... it should be rejected (with ... error message)


"import" will naturally lead to the specification of "what is a valid import (file) format" so you can break that down into a number of levels of specification of the format and the fields and that lets you write a number of tests that expect valid and invalid formats for fields and for the file as a whole.


Ya, I see what you mean now. Just really not my style. I think this does relate to the top-down or bottom-up approach. Also, most spec of complete useful systems are massive, the code in the end is the true spec, spelling it all out before the fact I don't think is really possible, or it would just take a lot of initial effort.


For example, how do I know I have a file? What kind? What format? What does it mean when import? What's the spec for import?


Like answering all that seems like such a slow process, and then, what if you made the wrong choice, and you realize this later? Change your spec, refactor your tests?


I'm not suggesting specifying everything up front -- we all know that doesn't work. You can write tests for each "question" as it comes up and decide what the output should be at each stage. TDD (and BDD) says you start with a failing test and make it pass -- it doesn't say you start with all your tests... 🙂


The point is you must answer those questions and you should write down your decisions -- somewhere other than just encoding them in your source file! -- and tests (or specs) are a great way to record those decisions and make sure future changes don't break things.


Very true, I like BDD, the idea, the frameworkd like Cucomber I'm not a fan, the loose english is great. Understand the use case from the interaction points. Its precise to what people will care about, but is loose enough in-between that I can quickly iterate many designs/implementations for it, until the best one emerges, at which point, I can put some automated unit test on that to prevent future regression. That's normally how I operate.


Now, of course it takes practice not to over-specify systems and produce fragile tests, but if you're just changing code without documenting your changes (esp. of acceptable input formats in the case of "import") then you're a poor excuse for a software developer since no one will be able to figure out what your code does without reading the source code (and folks who asked for "import" don't want to do that, right? 🙂 )


Ya, I guess, but in my experience, you always need to read the source code anyways, because the spec and tests are always slightly off. Its like an uncanny valley. It'll be quicker to read the source, but nothing beats pure source.


(all this said, there are definitely pieces of code I write without creating tests first -- for example where the "given" is too painful/complex to duplicate in code but the "when"/"then" is straightforward -- but I try hard to keep my REPL experiments at least in comment forms these days for ease of evaluation (into the running REPL) and those often become additional tests)


Right, maybe I should give it a better chance. Do you do it more so for pure code or for integration code?


I think it's really good discipline to force yourself to do strict TDD/BDD for a while on each project -- it often highlights all sorts of edge cases you might not have otherwise considered -- and figuring our invariant properties for generative testing is a particular good mental exercise.


If you do TDD, you'll find yourself wanting to separate side-effects from pure code more often -- leading to more reusable code that is easier to reason about (since testing stuff with side-effects can be painful).


I love generative testing. Properties I find more useful then tests. They hold for all, tell you a lot in a small amount of words. And they find a ton of bugs.


But, yeah, there are going to be sometimes where you expect the database to have specific content in it after certain operations, "given" a particular database setup. It's ideal to separate out the actual database but it's not always entirely practical. Or whatever side-effecty thing you need to do.


But you can certainly argue that tests with side-effects aren't "unit" tests -- although that has nothing to do with TDD/BDD in my mind.


And this quote from Kent Beck: > So there’s a variable that I didn’t know existed at that time, which is really important for the trade-off about when automated testing is valuable. It is the half-life of the line of code. If you’re in exploration mode and you’re just trying to figure out what a program might do, and most of your experiments are going to be failures and be deleted in a matter of hours or perhaps days, then most of the benefits of TDD don’t kick in, and it slows down the experimentation—a latency between “I wonder” and “I see.” You want that time to be as short as possible. If tests help you make that time shorter, fine, but often, they make the latency longer. And if the latency matters and the half-life of the line of code is short, then you shouldn’t write tests.


In my case, I have experiments with shelf lives of minutes or seconds sometimes too. My test writing always starts after that "experimentation" phase/


maybe the problem with my code base was that nobody ever left the experimentation phase, and we shipped an experiment to production


I'd be curious to know how much "test code" vs "production code" various Clojure shops have... here's our code "lines of code" totals:

Clojure source 211 files 48731 total loc,
Clojure tests 133 files 15092 total loc


(that's just raw lines of code -- and the "tests" incorporate "unit"-level all the way up to automated browser-based UAT stuff)


I don't think I can easily get those metrics, also, we're mixed Java, so it wouldn't show the full picture.


BTW, the pdf linked on David's post is quite interesting a read too:


I like this especially: > Turn unit tests into assertions. Use them to feed your fault-tolerance architecture on high-availability systems. This solves the problem of maintaining a lot of extra software modules that assess execution and check for correct behavior; that’s one half of a unit test. The other half is the driver that executes the code: count on your stress tests, integration tests, and system tests to do that.


Clojure specs fall in that mindset I think


I remember reading that piece by DHH when it appeared -- and several times since -- and there's a lot of "warning bells" in there about how he was doing both "unit testing" and TDD in my mind, so "of course" he found it problematic. I seem to recall several people in the Agile community responded to his post somewhat disparagingly (and it's not like DHH hasn't posted all sorts of against-the-flow pieces...).


This is very telling: > It just hasn't been a useful way of dealing with the testing of Rails applications. It speaks far more of the problems with Rails than with TDD...