@logosity's Notebook

Wherein I record thoughts that aren't (yet) fully formed ideas, but also too big for my twitter account.

The goal of "unit testing" in TDD

Recently Kent Beck wrote a note attempting to clarify his thoughts on the distinction (if any) between Unit Tests and Integration Tests. I decided to pull my comment out to a note of my own as it was getting a bit too long for a comment on Facebook - which of course has allowed it to grow further into the following...

My conclusion on this never-ending debate is that "unit testing" (or more precisely TDD) should be viewed more as "incremental specification" and "integration testing" (if I needed) provides a form of "executable documentation."

This is not merely word play on my part however. Instead, it represents a deep shift in my thinking about the goal of the practice. Namely that it is not about correctness at all - despite the apparent similiarity in terms of mechanism, scope (and naming) to other testing practices that are. Here's how I came to that conclusion:

First, I narrowed my concept of "testing" to mean running the suite, not writing of the artifacts themselves. Doing so, made it clear that I wasn't terribly concerned with traditional notions of correctness. More surprising, was that I wasn't terribly concerned with design (whatever that is) either. What's motivated me to keep practicig TDD for nearly twenty years is attaining the peace of mind that what I'm creating sovles the problem I set out to solve - and nothing else. In other words, I'm not after correctness, so much as specfication. Of course, I've long been familiar with the BDD practice of calling these artifacts 'specs.' But the 'aha' for me was that I wasn't conccerned with specifying behavior (or correctness) so much as creating a specification for the theory the underlying code is supposed to model.

This in turn, provided an answer to something that has long bugged me in descriptions of traditional software process. While there is often mention of a spec, not only are they rare, but little to no attention is paid to how one can determine whether the specification itself is correct. However, this seems to be precisely the problem that TDD (along with frequent releases, close communication, iteration and the other practices of Beck's Extreme Programming) is meant to address: To not only write the code, but also to write the specification together, one step at a time incrementally and with rapid feedback between those people who have that theory (i.e. the customers) and those that are tasked with reifying it (the programmers).

This leads to seeing the artifacts themselves more like formal explanations (i.e. a laws) defining the theory modelled by the code than as simply tests. More akin to the formal specification of a scientific experiment1 than to testing hypotheses. For example, note the similiarity between what we in the TDD community commonly call "fixtures" and "tests" and this description of empirical explanation within the Newtonian Paradigm (i.e. the predominant model for how science is conducted today):

Under the Newtonian paradigm, we construct a configuration space within which the movements and changes of a certain range of phenomena can be explained by unchanging laws. The range of experience defined by the configuration space and explained by the laws can in principle be reproduced, either by being found in another part of the universe or by being deliberately copied by the scientist. The recurrence of the same movements and the same changes under the same conditions, or the same provocations, confirms the validity of the laws.

The configuration space within which changeless laws apply to changing phenomena is marked out by initial conditions. These conditions are the factual stipulations defining the background to the phenomena explained by the laws. The stipulations mark out the configuration space: the space within which laws apply to the explained phenomena. By definition, they are not themselves explained by the laws that explain movements and changes within the configuration space. They are assumed rather than explained.

However, that they perform in a particular part of science the role of unexplained stipulations rather than of explained phenomena does not mean that they cannot reappear in another chapter of scientific inquiry as subjects for explanation. In the practice of the Newtonian paradigm what is stipulation for some purpose becomes the subject matter to be explained for another. That the roles of what is to be explained and what does the explaining can in this way be reversed ensures that we can hope to explain all of the universe, part by part.

In other words, the artifacts called "tests" (in TDD) define the configuration space and initial conditions to predict the behavior of observed phenomena and record the results.

And so, when a test fails, I've falsified as much of the theory of the code as is expressed in that test. When it passes, I have provisionally2 confirmed some aspect of that theory. As the suite grows it thus becomes a (partial) formal specification of that underlying theory. It defines the conditions on which we can be sure, the code does not meet our expectations (i.e. failing tests tell us that it definitely doesn't work under the conditions specified therin).

This is why I now think of TDD as "incremental specification" - rather than another word for "unit testing": because it has nothing at all to do with the relationship between the parts of the code per se, it's about making falsifiable (and so empirical) claims about what my code does. The scope of that claim is a function of method not the artifact. I.e TDD being a method for producing an incremental specification makes claims about how to write and how to scope such formal experiments3. However, when it comes to these naming debates, what is more important than the validity and scope of one's chosen method is the realization that while we think we're debating method and terminology, we are actually ALSO (and primarily) debating whether the methods in question are for determining the code is correct relative to a presumed specification (traditional testing) for creating one while also writing the code (TDD).

This is why the naming debates don't matter. This is why combinatorial testing is not a substitute for TDD. This is why coverage is rarely a concern for TDD practioners. This is why the endless debates on mocking occur 4. This is why arguments that attempt to invalidate TDD by pointing out how easy it is to write BS tests miss the point. And most important: THIS IS WHY WE WRITE THE TESTS FIRST!

From here it's pretty clear why "integration tests" - as TDD practiioners typically write them - are a different beast entirely and much less important methodologically. When I do write them (which is not always), I view them as executable external documentation of how the (sub)system works5 and refer to them directly in the README. The same goes for every other testing strategy out there: I do them when they make sense situationally, but methodologically, I write code using TDD.

In conclusion: Of the many gifts that Kent Beck has given to the world of programming, none is in my opinion more theoretically important - or misunderstood - than TDD. As I've written elsewhere my growing belief is that this is due to a theoretical weakness in Software Engineering. In particular, SE's partial application of outdated scientific theory (i.e. positivism) to the creation of software. TDD represents a deeper - and more modern - understanding of what can be considered empirical knowledge. TDD's deemphasis of the value of correctness in favor of rigorous - and iterative - falsification is precisely what makes it so valuable to delivery success.

  1. Hence the execution of these artifacts being analogous to "running the experiment" i.e. confirming or falsifying it's results. i.e. testing. 

  2. Provisional because we are only sure to the extent that the "fixture" (i.e. the configuration space and initial conditions) represents other situations in "the rest of the universe" which is something we cannot know for sure - only the falsification is certain empirical knowledge.  

  3. I recently took a stab at descibing my TDD-based method in an earlier note  

  4. i.e. because both goal and method are being conflated in those discussions. However, the role of mocks becomes clear if you reread the final paragraph of the Unger quote. What is being stipulated in one configuration space, can itself become the subject under consideration in another. That's what mocks allow us to do, but again if our method sucks we won't do that well. 

  5. A system brought into being using some method of incremental specifcation such as TDD. 

The (not so) Hidden Cost of Coordination

I really enjoy periodically re-reading Mel Conway's Paper: "How Do Committees Invent?" (The paper that inspired Fred Brooks' famous Conway's Law) - I always find new insights. This time it was insight into the cost of coordination. As Conway points out in the paper:

Once scopes of activity are defined, a coordination problem is created. Coordination among task groups, although it appears to lower the productivity of the individual in the small group, provides the only possibility that the separate task groups will be able to consolidate their efforts into a unified system design.
"How Do Committees Invent?" - Mel Conway

Got that? the more groups there are, the more the overall organization must trade productivity for coordination in order to deliver successfully. How much more? Well, it's not in Conway's paper, but it occurred to me today that a reasonable model is the Handshake Problem.

Briefly, the handshake problem answers the question: "How many interactions are necessary for everyone at a gathering to shake hands with everyone else at the gathering?" The answer is the formula: n(n - 1) / 2 i.e.: First, calculate each person shaking hands with everyone else, then divide that by two since we don't want to double-count handshakes (A shaking hands with B is the same as B shaking hands with A).

Based on this model, the following table suggests a reasonable growth of the coordination costs of a delivery organization based on the number of coordinators required to deliver a feature:

Coordinators:  2 Interactions:   1
Coordinators:  3 Interactions:   3
Coordinators:  4 Interactions:   6
Coordinators:  5 Interactions:  10
Coordinators:  6 Interactions:  15
Coordinators:  7 Interactions:  21
Coordinators:  8 Interactions:  28
Coordinators:  9 Interactions:  36
Coordinators: 10 Interactions:  45
Coordinators: 11 Interactions:  55
Coordinators: 12 Interactions:  66
Coordinators: 13 Interactions:  78
Coordinators: 14 Interactions:  91
Coordinators: 15 Interactions: 105

Typically, every group in a delivery organization will have (at least) one coordinator role associated with it. The names of these roles changes with the fashions of the day - popular ones today include "Project Manager", "Product Manager", "Tech Lead" and "Product Owner" - but their function doesn't change: They represent the communication edges within the larger organization graph.

So, based on the table above, if a delivery organization contains five groups, in the degenerate case where every group needs to coordinate with every other group to deliver a feature, the coordination costs will be over three times higher than if there were only three groups. And that might not be as rare as it seems since once a full-time role is established, work tends to expand to fill the role - meaning that a five-group organization will get 400 hours of coordination a week - whether it needs it or not.

All of this coordination effort raises the cost of decision-making and thus negatively affects the value of delivery for the organization as a whole.

Conway's Law runs deep.

I really hate manufacturing metaphors

Bad metaphors encourage bad inferences. The manufacturing metaphor for software is a glaring and pervasive example. Because of it, you can't read any article on software process without hearing how writing software is like building a house, or a car, or a bridge, or perhaps a skateboard. With wings. While wearing a parachute. It's kinda silly really. And if you stop to think about it, it's not really much like what we do at all.

Perhaps you disagree. Surely, we really do build software right? We also design it and architect it. We deploy it. We even assemble it. Manufacturing pervades our thinking - especially at the edges of process and low-level compilation - to such an extent that we are the proverbial fish that can't see the water. But this metaphor is arbitrary1, shallow2 and harmful. And we need to let it go.

The problem is the inferences it suggests have led us astray - particularly when it comes to process (also a manufacturing term). We pipeline our development, We QA our builds and scale our architectures. It suggests that the things that make manufacturing work better are the things that will make software writing better. We even use the same words for this too - productivity and quality being two of the more prominent.

But is that the case? For example, is it actually better to break up development into pieces and then fit them back together? Do we actually know that if we incur the cost of coordination that we will get the benefits of specialization? Or do we just ensure that work expands to fill the roles. Do we know that spending a lot of time on standards and and architecture before we deliver results will save us money and/or time? Or is that just what the big companies do, so it must be a best-practice? Does Just-In-Time really provide a good way to write software, or is it just a better way than traditional manufacturing for consultants to teach?

One way to find out is to try changing the metaphor and see if any of these things start to sound absurd. For example, if we consider technology delivery to be more like developing a scientific theory, does it still make sense to have architects (or product owners or any other oracles?) breaking the work up by speciality and then trying to fit it together again later? Or does our inference start to run the other way: Does it instead suggest fostering simplicity, focus and communication?

Or do things that seemed perplexing start to make more sense, such as why estimatation is so hard, or why a few people can get so much more done than a lot of people?

The point is: Metaphors matter. They make a big difference in how we decide what to do in the face of uncertainty. The manufacturing metaphor has dominated software since at least the advent of software engineering in the late 50's. It's time to look for alternatives.

The problem with doing so is that good metaphors also require relevant experience. If you haven't done theory building (or even simple software delivery for that matter) then you might think that manufacturing is a great metaphor for software delivery - especially, if you've been cracking away at trying to improve it - and followed those who've used it before. I suspect this (and not because it's good) is why this problematic metaphor persists. It also suggests that it will continue to dominate our thinking until we find a deep, cohesive metaphor that can actually help us make good inferences; one that we can also actually relate to.

So, until we find one, I recommend we use food. For example, when explaning why Vertical Slices is a good delivery strategy, don't trot out the tired old house metaphor. Use fruit. It arguably works better than manufacturing for this, it also has the benefit of being simple and widely accessible. It also doesn't involve building a damn house. For example:

Why do vertical slices? Because, writing software is like eating a banana. You can jam the whole thing your mouth all at once, but a) you will probably get sick b) you won't get done faster and c) you can't change your mind when you realize the banana is rotten halfway through and eat an orange instead.

It also helps if you peel it first.3

  1. I suspect it has its origins in the fact that commercial software rose in the military and manufacturing sectors and early decision-makers just borrowed the management processes of the parent domains, prompting the technologists to map those metaphors into their (umm) tools. 

  2. Shallow in the sense that the source domain (manufacturing) and the target domain (writing software) have no inherent relationship to one another so little in the way of leading us to new insights. For more on this role of metaphor see this paper  

  3. Kudos to John Flatley for adding the peel 

Monitoring & Continuous Delivery: Fix It Now

In the Eternal Now There is No Last Thursday
Me (apocryphal)

Continuous delivery can be understood as short, nested feedback loops. Continuous Testing and automated deployment are obviously key parts of the equation, but to really take advantage of the practice also requires real-time monitoring data that is easy to add, change, remove, act on and then... be rid of.

This is at odds with more traditional monitoring approaches that take a "grab everything, and keep it because we might need it someday," strategy, but in my experience, people underestimate the power of continually focusing on what's happening now. I'm not sure why that is, but I suspect it is because they haven't experienced the benefits of a truly incremental approach to delivery. For example, if you start with deploying an empty application[^1], then work and deploy in very small increments, new issues are likely to also be small - and related to your last few deployments (at most).

Regardless, once you take on a "Fix it Now" mindset, keeping monitoring data around starts to feel less valuable. If you're in transition, you can push the data (or a superset, such as logs) to a different system, and keep your operational monitoring data timely and meant to be used when the issue is occurring.

There are several benefits to doing this. For example:

  1. Your monitoring becomes much more actionable. This is because you can afford to focus your monitoring on just those things that will clue you in to a problem or opportunity - analytical data has been separated from this information, remember?

  2. Your monitoring systems no longer need to be transactional. This is not to say that transactional is bad should be avoided, but it becomes less important because the chances that any one moment that your monitoring infrastructure misses a message is also the moment when your system was alerting you, is vanishingly small (obviously, more serious interruptions will be noticable). The advantage of non-transactional approaches is that they are typically easier and less invasive to add to systems.

  3. The number of issues that stick around starts to dwindle. Once you commit to eliminating (and not just monitoring) issues once they arise, you start to see your monitoring metrics, less as permanent fixtures, and more like temporary guideposts. Once things are behaving predictably, new issues are much easier to see. This makes alerting easier to set up (fewer false positives) and more actionable once it occurs.

The Next One will Be: "Monitoring is a Smell"

Monitoring & Continuous Delivery: There is No One-True-Way

...one tool to bring them all and in the darkness, bind them.
Anonymous Mordor OpenView Admin

It might seem appealing to have one approach for all your monitoring goals, but the amount of coordination this requires results in tools, policies and practices that are bloated, difficult to use and often abandonded. This is only exacerbated in a continuous delivery environment where information that is critical to your operation one day might be obsolete the next. Instead, consider adopting monitoring solutions on a per-context basis - both in terms of purpose and by system.

By purpose, I mean the overarching goal of a particular monitoring effort. There are many reasons for collecting run-time data and only some of these are valuable for ensuring that systems are peforming as intended. Rather than trying to find tools, practices & policies that can handle such diverse purposes as historical analysis, real-time alerting and security (or regulatory) auditing, favor approaches and tools that allow you to achieve these goals independently. If the same measurement is needed for multiple purposes, consider sending that measurement to multiple destinations from the source - or fanning out a single message via queuing or multicast strategies. Whatever reuse you do adopt, try to ensure that your simplest use-cases don't have to carry the baggage of your most complex ones. Such things as message schemas, signal frequency or instrumentation tools, should be determined in-context.

In fact, as much as possible should be context-specific. What might be a simple and effective approach for effectively monitoring one system or application is often difficult or inappopriate for another. Worse, the inevitable changes that arise introduce subtle dependencies between contexts that can slow - or even eliminate - otherwise successful continuous delivery efforts. Instead, ensure common approaches are widely available & supported (but not required!) while context-specific alternatives are not just tolerated, but expected. While it is fine to encourage individual delivery efforts to consider common approaches, the emphasis should be on how it makes it easier for them to get the information they need to take action rather than abstract concepts like reducing duplication of effort.

Finally, if you do have a central monitoring organizational presence, this group should take on a consultative role that is focused on sharing common cases and serving as a marketplace for ideas and patterns discovered in the organization rather than as enforcers of the "one true way."

The Next One is: "Fix it Now"

Monitoring & Continuous Delivery: Intro

I spent a lot of time over the last decade writing monitoring tools and thinking about ways to use them effectively. Primarily, this was to support what I like to call Operational Monitoring: monitoring that provides actionable information about the current operational state of computer systems and devices.

In doing so, I've gotten some idea of what sorts of things are effective, less effective, and confusing about this topic. I think these ideas may be useful to anyone trying to establish or improve their understanding of "what's happening now" in the systems they are concerned with, but one area that is of particular interest to me is how such monitoring assists delivery efforts that are practicing continuous delivery because this is an area that not a lot of people have explored to my knowledge. Thus, I call that out where appropriate, but I find these ideas to be useful for systems that are delivered in more traditional ways as well.

I hope to eventually gather these notes up into a single article (or series of articles), but for now, I'm going to post them as I can. I plan to prioritize those conclusions I find counter-intuitive, or particularly effective - especially those that run somewhat counter to the prevaling wisdom on the subject.

Monitoring is an overloaded term of course, so I will concentrate on the kind of monitoring that I find most useful; that is: Identifying, sourcing and delivering actionable metrics that give me confidence that I have an accurate understanding of the current state of things, so that when they are amiss, I am quickly made aware of that and can take effective action to correct them.

In other words, the central goal of such monitoring is to improve the likelihood that I will become aware of issues before my user-base is aware of them, rather than passively responding to their bug reports or other distress calls. An effective solution is thus one that allows me to communicate an issue - or even correct it - before they are even aware it is happening.

Achieving this in a way that easy to incrementally change, minimally noisy and still effective can be very tricky. These principles helped guide us toward effecive solutions. My hope is that you will find them useful too.

The first is: "There is no One-True-Way"

Toxic Environments for Teams

I credit Ben Rady with the following: "We are not on the same team if some can succeed while the others fail." Really, is that so hard?

It seems that it is. "Team" has to be the most overused and abused term in the corporate world. Used to describe everything from a workshift of fast-food employees to the senior management staff of multi-national organizations, it seems every manager on earth wants everyone else to believe that they aren't just some boss, but an intrepid leader of some "merry band" that's out to conquer the world.

The thing is, you don't just get that deeply coordinated sense of rhythm and urgency that the word team evokes just by calling a group of employees a team. This is especially important for technology delivery.

Sure, many of those managers calling their organizational units "teams" understand this. That's why they write mission statements and exhort their employees to "think of the big picture" and "work together."

And that's where the problem lies: approach. As Ben's definition makes clear, groups don't become teams just because one gives them encouragement. It happens (or not) based on how work is partitioned and how success is rewarded within the wider organization. And while some of this is on the immediate members of the team and its leadership, the brutal fact is that true teams - like the kind that Ben and I were on together for over five years - cannot thrive if the wider environment is toxic to their formation and persistence.

What does toxic look like, you ask? Here are some common factors:

That's a few of the more common things I've seen that hinder team formation. But in the end, teams form naturally where people want or need to work together. That's partly based on their own experiences and expectations, but a lot of that will be determined by environmental considerations.

So, take a look around. Are the groups in your tech organization really teams? Do you want them to be? While Ben's definition might not tell you how, it sure does make it easy to figure out when you are (or are not) there.

  1. Thankfully Jack's advice is now out of favor. Having read the book, I think the biggest problem is inappropriate context. Welch was describing how GE - one of the largest companies in the world - solved the problem of having too many senior manager's (think Alec Baldwin's Jack Donaghy from 30 Rock) vieing for executive roles. They told them they were in competition, and that the organization needed to keep growing new people below them, so their fate was to be "force-ranked." The top would move up to the executive team - the middle would get another chance next year, and the bottom would be cut loose. Now, what (might have) made sense in this context, is a HORRIBLY bad way to deliver software. Please don't do this to your technology organizations. It really sucks for everyone involved, and results in lousy performance to boot. 

  2. Ben was on the team with me for all but the first and last years of its existence. 

  3. I've had the good fortunate to be part of such teams in technology, sports & the military. 

  4. Better for team formation. In my experience, people stuck in such situations wish they had cubicles. 

  5. If not, you better solve that problem first. 

TDD: "Failing to Falsify"

I've been writing code "test-first" in one way or another for around 18 years. I still find most descriptions of the practice - both from those who advocate it and those who oppose it - woefully misleading. I will do no better, but I have to try. Again.

TDD is classically described as a mechanism where one repeats the following four step process:

  1. Write a Failing Test
  2. Write code that makes the test pass
  3. Refactor
  4. Argue endlessly about:
    • What is a test.
    • How much code to write.
    • What refactoring means.
    • The role of mocks.
    • What is a mock.
    • What is a Unit Test.
    • What code can't be written this way.
    • Whether TDD is required for professionalism (NO).
    • Whether one thinks about design or not (and when) (YES, ALWAYS).
    • Anything else that will help (or prevent) sales.

Speaking as someone who:

I am going to try something different. I'm going to try to describe, not a mechanism but a method. One that I have followed (and evolved) while working with others who also find this method to be effective over almost two decades. This is hard, because reflecting on something is not the same as doing it, and intersubjectivity is difficult when we agree on descriptive terms, let alone in our pedantic quagmire of an industry. So, I'm going to avoid all the Software Engineering terms I can. No mention of "tests" or "design" or "quality." I will talk of assertions and refactoring because these have precise definitions (even if some choose to ignore them). I'm also not going to tell you why you should or should not adopt this method or that if you don't, you will be less good at technology delivery than the teams I've worked with for the past 13 years.

I'm just tired of seeing what I do misrepresented by others and being tarred with the same brush. So, I'm going to try to describe it. That way when someone wants to Argue Endlessly with me about the mechanical strawmen in step #4, I can just point them here and get back to work.

So, here is a reflective description of the method I like to follow while writing code. It is numbered for clarity, but the only really temporal qualities are: it is (non-sequentially) iterative, and I typically create assertions before the code that makes them pass:

  1. I think - sometimes for quite a while - about the problem I'm trying to solve and how to solve it.
  2. If I'm pairing with someone, I spend quite a bit of time discussing the problem and understanding what they think. If I'm not pairing, I might bounce my ideas off someone or otherwise seek the knowledge of others about the problem in question.
  3. I often try things out (e.g. using a repl, console, command line, web page, whatever) especially when starting a new task.
  4. I ensure all existing assertions 'fail to falsify' the system I'm changing before changing anything.
  5. I write some code that asserts something not currently true about the system (i.e. it falsifies the assertion)
    • I think a lot about how I wish things would look once I'm done.
    • I try to be as specific as I can.
    • I reference - as yet unwritten - code that would confirm the assertion if it existed.
    • I flesh out the context using fakes, spies, mocks, setting up state, etc until the assertion is in what I consider a valid environment.
  6. Run this code and confirm my new assertion - and only my new assertion - succesfully fails.
    • I often iterate this and the previous step until I am confident the environment is valid.
  7. I add just enough code to confirm the assertion fails to falsify (i.e. passes).
  8. I confirm that all my other assertions 'fail to falsify.'
  9. I change names, extract functions and do other refactorings until I'm comfortable that I can add another failing assertion.
  10. I keep doing this until I'm satisifed that I have solved the problem.

Step #5 is easily the hardest part. It's where imagination starts getting real; where my ideas on how I might solve this problem become a reified expression on what could not be false if it existed. Writing the actual code that makes the assertion pass by comparision is reasonably straightforward because I've already imagined what it will be.

I also learn as much as I can about computer science, the tools, libraries and languages for writing code and how others have solved similar problems. All of it informs my decision-making. I often write code using other methods. Often (but not always) I regret it.

Some things to note about my description:

So have I proven anything? Certainly not. Have I made things clearer? Probably not. But this note is getting long so I will summarize:

There are many ways for me to fail to deliver using this method - I can be overconfident, I can misunderstand the problem, I can create an assertion context that does not match the enviroment where the solution will be used, and many more.

But, what this method does offer me is an empirical and iterative approach for modeling the resolution of the demands being placed on the system. The resulting computational artifact can thus be viewed as a model of such a theory. And like theories, that artifact can never be correct, only provisionally true until falsified.

So, the skill1 in employing this method involves writing code that has the highest empirical content I can, and subjecting that code to the harshest falsification I can. What I've learned is that it is much easier to do so by starting with a single falsifying statement and proceding in very small increments, keeping a tight control over the execution context (i.e. the experimental conditions under which I apply my assertions) and ensuring that my feedback loop is measured in milliseconds. It also requires constantly thinking about ways I could be wrong.

In sum: TDD is not a process where one "writes a test" and "evolves a design" - I don't think "test" and "design" mean much in practice (or they mean everything). Regardless, TDD as I practice it, is part of my delivery method - not some algoirthm that powers my success. I find this method useful to formulate hypotheses about my delivery environment and incrementally test those hypotheses against reality. If the resulting computational artifacts satisfy the demands placed on the system, they become my best model of the theory embodied in those demands. A model forever subject to falsification, but forever capable of being changed so that it does not - so long as my creativity fails to fail me.

  1. and a lot of skill and practice are necessary to use this method to write code effectively and efficiently. Anyone who tells you different is selling something. 

The Barnum Effect in Software Engineering

I recently read this (free-trial, paywalled) Sample chapter from the book "Managing Humans" by Michael Lopp. It is a glaring example of what's wrong with what passes for knowledge-sharing in our industry. I'm going to explain why.

The book itself was published in 2012 (I had never heard of it before seeing the link shared on a message board), but a quick search netted this blog post from the author in 2003 that is clearly an early draft of the book material.

In it (and the sample chapter), the author relates a disagreement he once had with another engineer that the author attributes, not to their understanding of the problem, but to their respective approaches to problem solving that he terms "Incrementalist" and "Completionist." Incrementalists (according to the model put forth) are: Politically savvy, opportunists; realists who have a good idea of what is achievable in the given context, but who don't have a good sense of the long-term big picture. They are addicted to being busy. 1 Completionists on the other hand, are perfectionists; dreamers who believe that effort not put toward correct solutions is wasted effort. They spend most of their time muttering to themselves how incompetent everyone else is, while envisioning elegant long-term solutions to every problem.

The author positions himself as the incrementalist in the original anecdote, but speaks from the perspective of a manager over both "types" of engineers. Having defined his model, he then proceeds to coach the reader on how to manage such a conflict. His advice basically rounds down to: "Let them fight it out, without killing each other because they are both caffeine-addicted children." The solution from his perspective is really simple: They are complementary personality types so, "your job as manager is to find and marry these personality types in your organization." The incremenalists supply the approach and the completionists supply the vision. Apparently, this marriage will be a happy one and you'll all go on to resounding success once you, the manager make it happen.

There are many things wrong with this "wisdom." Some of them are relatively minor. It paternalizes the management role for example, implying that these two types are both wrong and a Hegelian Synthesis orchestrated by the manager is the best way to address their shortcomings. It also has a polarizing effect: When confronted with a disagreement, one need only identify with one of the roles to assume that the problem is due to the other person being from the opposite type.

The much larger problem with this analyisis is the weakness of both the author's model and the method he chooses to draw conclusions from it. The author posits an axis ("There are two types...") to describe himself and this unammed other and then generalizes from it to all engineers. He then builds a normative management theory from it that describes not only features and flaws of engineers but what managers should do when trying to cope with them.

The insidious and pervasive problem with such analysis is that it invites the reader to accept (or reject) the purported axis while leaving the entire question as to whether such axes - or such general management wisdom - are even a useful model of the problem.

In other words, if one accepts his definition of the axis, then one will begin to see incrementalists and completionists everywhere. However even if one rejects the definition, if you accept the idea that such axis-based models are useful - which clearly seems reasonable given the author's presentation and their proliferation throughout popular culture, then one might be tempted to rebut this article by taking issue with the definition, while leaving the author's methodology unchallenged.

I initially fell for this, and quickly identified a half-dozen other axes2 that I felt were also relevant, turning the author's neat one-dimensional phase space into a 6-dimensional hypercube!

Now we're getting somewhere right? Clearly, this model will be even better at describing engineers than a single axis based on a single anecdote! Depending on how we define each wing of each axis, we could even construct an entire zodiac of personality types.

And it would be just as useful. See, even if the author hadn't put forth an over-simplified model based on the anecdotes and experiences of one person (he did), by taking as a premise that "engineering personality types" are even a thing, the author lures his readers into the pernicious clutches of The Barnum Effect.

The Barnum (or "Forer") effect, is the bread-and-butter (literally) of fortune-tellers, palm readers and other purveyors of pseudo-science. The audience is presented with definitions that seem very discriminating but are actually very general and can apply to anyone. Once the audience chooses (or creates) a definition that suits them, their own propensity toward feeling good about themselves ensures that they will see in every conflict a person representing the other side of their chosen position.

In short, Lopp's argument is bullshit. But it is a very special and pervasive kind of bullshit that far too often colors the interactions and decisions made in technology organizations: Take an anecdote, extract a descriptive model, sprinkle on some catchy metaphors and then start handing out normative prescriptions (aka "best practices").

In other words, I'm not claiming that Lopp made the whole thing up, or that he wishes to deceive his reader, I am saying he commits a fatal error in knowledge-sharing by attempting to extract a general model of technolgy management from a single anecdote about his own subjective experience.

It is one thing to do this sort of model building for oneself - there's evidence we all do it all the time. It's quite another however, to shop that model in a book and claim it represents delivery knowledge.

As I've mentioned elsewhere, I have been exploring what means to effectively share knowledge. Spoilers: It is really difficult. It has entire branches of philosophy devoted to it, and most of what passes for knowledge-sharing is bunk.

But clearly, there must be some value in Lopp's experience, and we need something - don't we?

Well, in this case I suppose I will have to side with Lopp's Completionists: Until we have a solid model for describing technolgy delivery, one that recognizes the shortcomings that we inherited with the Software Engineering paradigm3, we will continue to make the same mistakes that we have for fifty years. Such knowledge isn't even wrong - but it can be harmful. So, no we can't substitute pseudo-science models like Lopp's for delivery knowledge, because they just reinforce what we already believe. They teach us literally, less than nothing. The only remaining value is in his experience - and he can't tell us much about it, because all available models (not just his) suck. A vicious circle indeed.

So what then?

My working hypothesis is that until we have a general, empirical model that effectively describes delivery, we will remain mired in Hume's "intangling brambles of metaphysics." We will endlessly repeat the same patterns of anecdote, hype and disillusionment. In its absence, the most effective path available to us is to lever our experiences to craft - from scratch - models and approaches that work for the people and contexts that we find ourselves in. Every. Single. Time.

No wonder technology delivery remains so difficult.

  1. This concession is in the sample chapter. The 2003 article attributes many more heroic characteristics to the Incrementalist - the author being one (of course) - than the chapter does, which is perhaps more defensible, but still no more knowledge-laden than the article. 

  2. e.g. incremental <-> completionist; strategic <-> tactical; intuitive <-> rules-oriented; think-then-talk <-> think-while-talking; falsificationist <-> verificationist; empirical <-> conjectural and so on... 

  3. This paradigm includes the increasingly wrong-minded Agile reformation 

Systems Analysis of Kent Beck's Mission Gambit

In order to evaluate Kent Beck's 'The Mission Gambit'1 it's important to remember that there is more than one interest represented in the discussion and thus an interest-based analysis may prove useful here. To illustrate how this might work, I've sketched out an example. Even in this rough form, it begins to clarify (or render meaningless) some of the issues raised in the comments and also what factors would have to be true for Kent's conclusion to be generally valid (spoiler: I'm skeptical that it is, but understanding the contexts where it is valid could still be useful).

To start, let's consider just two interests2: "Technical Employee" & "Board Member." Each of these interests may value the mission of the organization differently, thus their mutual support for the mission hinges on how this value is allocated by the decision-makers3 of the organization and how each interest perceives the situation that will result from various outcomes. This value judgement represents part of the position that they take on matters of policy and the sorts of demands that they will place on the organization.

The following are reasonable hypothetical positions for each interest:

1) For the Employee Interest: Mission is an element of total compensation.4 A rational actor in this position will seek to maximize total compensation while minimizing risk. Their options for doing so include leaving the organization.

2) For the Board Interest: Achieving the mission in the most efficient manner available is the raison d'etre of the organization. They will accept any solution that achieves the mission, provided it does so by supplying a positive risk-adjusted return on their investment. Solutions that increase this ROI will be considered more valuable than those that offer less.

Now both interests will be represented to some degree in the policies and culture of the organization at the time our analysis begins. It is the ongoing responsibility of decision-makers to ensure that the demands of these interests remain sufficiently addressed as the environment changes. They may do this in a variety of ways. For example, when they declare mission to be a viable substitute for monetary compensation, they seek to align the efficiency demands of the board with the total compensation demands of the employee. If they are successful, both interests will increase their support for the organization as a whole, if they do not, one or both of them will withdraw their support5.

Decision-maker success6 thus hinges on how both interests view the consequences that either of them will withdraw support: Will the mission fail? Are there outside actors (i.e. potential employees) that will accept a lower-level of compensation to work toward the mission and yet still achieve it? Will the board accept a less-efficient approach? Etc. Also, these decisions are not single-moment outcomes: support will change over time as each decision affects the environment and the interests themselves. This process continues as long as the organization remains a viable going concern.

So, what of Kent's assertion that mission belongs with perks rather than the money? From this brief analysis, it seems clear that no such general conclusion is possible. Instead it depends greatly on which interest one is considering (or identifies with), how each views the role of the specific mission in terms of compensation (this could range from central to irrelevant) and will vary over time depending on how those interests view each other, the policies that are provided, the larger job market, how effectively the mission is being accomplished, and so forth.

Other hypothetical positions are of course possible (e.g. the employee interest deems the mission to be a central life-goal), and a true analysis, would need to test the validity of whatever hypotheses are chosen (e.g. via survey), but unless it could be shown that such motivations are nearly universal, the argument's validity seems context-dependant. However, I think such analysis could still prove useful, as the basis for a normative exercise: That is to identify what interests and environmental realities would recommend placing mission in the perk column - and form a stronger basis for an argument on why that would be good for those interests (and possibly organizations in general).

  1. Kent's central claim is that organizations should view "Mission" as a perk like laptops or enjoyable fellow employees rather than assert that it is a valid substitute for monetary compensation. 

  2. Defined terms are emphasized on first use and intended to be consistent with David Easton's work on political systems theory, which is in turn an important part of my (as yet) unpublished Delivery Systems model. 

  3. Decision-makers (or Authorities) may be members of zero, one (or both!) of the example interests as well as others. For brevity, their interests are not discussed here, but are likely very relevant to whether Kent's assertion holds for a particular organization. 

  4. Including extra-corporate considerations such as commute time, proximity to favorite hobby venue, risk tolerance, etc.  

  5. For illustration, we assume only alignment achieves greater total support, however total support may rise if one interests raises support more than the other withdraws it. 

  6. Simply stated: success is equivalent with achieving a stable level of support that achieves the mission. 

The Principle of Least Common Manager

An important problem in any organization is how to resolve disagreements between peers. Particularly difficult, are those due to misaligned interests. That is, the desired outcome of one party is the undesired outcome the other. Such disuptes tend to escalate until they reach the level of the participants' first common authority.

When I first noticed this pattern many years ago in my consulting work, I dubbed it "the Principle of Least Common Manager." Understanding it should be among the first concerns for anyone tasked with structuring an organization. To understand why, we need to consider what happens when alternate mitigation strategies fail.

Simple example: A and B are peers[1] in a multilevel organization. Both use a shared resource S to do their jobs. Only one can use the service at a time and each gains value the more they use it. Since both A & B seek to increase their value, they see benefit in increasing their use of the shared resource as much as possible. Under these conditions, A & B are very likely to increase their use of S until they are in conflict with one another (that is negatively affecting each other's work). At this point, what are their options?

  1. Simplest is to just live with the conflict. Since each consumes S as much as possible and they are peers (cf. [1]) they will each wind up with roughly an equal amount of value, though because of the conflict this will net them less than 1/2 of what they would have otherwise.
  2. Work out an a solution directly. Examples include scheduling so as not to interfere with one another, and various forms of barter. The details will be context dependent but include such elements as their informal relationship and the flexibilty of their respective duties. Regardless, this approach offers an increase in value over option #1 but still splits the value equally[2].

(1) peers in that they are equidistant from the common authority, have no informal rank disparity and have the capability to consume the example's shared resource (S) equally. Any or all of these can vary, and while they may affect the outcome (i.e. who gets what percentage of S and how quickly), the principle still applies.

(2) Assuming fair value for whatever barter arrangements A & B agree to.

Reflections on CAS

Various thoughts on the topic of delivery systems sparked by reading about the evolution of biological complexity:

Organisms that reproduce more quickly and plentifully than their competitors have an evolutionary advantage. Consequently, organisms can evolve to become simpler and thus multiply faster and produce more offspring, as they require fewer resources to reproduce.

If delivery systems (the analogue to an organism here) can be effectively modeled as CAS (effective in the sense that useful insights or conclusions can be identified by and shared among practioners), then one would expect to see system success correlated with those organizational strategies that can easily be copied and used throughout their environment or domain (e.g. wider organization, the open- source world, gov't contract space, etc) and not with quality or happiness and some such unless those elements are necessary to individual system survival.

Mutations causing loss of a complex trait occur more often than mutations causing gain of a complex trait.

Implies a symbiotic relationship between a single delivery system and the wider organization should be visible (e.g. sharing of project management activities, or reliance on informal context-sharing activities like inter-group hallway discussions to satisfy certain demands or elicit supports over formal gatekeeping processes.

This trend [growth of complexity] may be reinforced by the fact that ecosystems themselves tend to become more complex over time, as species diversity increases, together with the linkages or dependencies between species.

This seems to be an argument against the possibility of effective modeling because it appears that diversity in delivery has declined steadily. One possibilty to explore here is that this is done at the expense of goal-driven success - i.e. control becomes more important a driver of system success than the reification of any particular computational model and deriving it's resulting value.

One of the more enticing possiblities of looking at delivery systems through the lense of social systems theory is that it could provide additional tools for explaning the loss of diversity and suggest ways to increase diversity (ones where this demand is satisified while also accounting other demand & support signals that are being sent to the delivery system).

How We Misuse Metaphor in Tech

Building on the last note, productivity is not by any means, the only metaphor we abuse when discussing technology delivery.

First some clarification. Metaphors can be used both as a literary device (i.e. linguistic metaphor) and as an aid to expanding knowledge (i.e. conceptual[1] metaphor). We are taking about the latter here.

We can use metaphor to aid the understanding of new ideas by establishing a correlation from elements in a source (familiar) domain to a target (new) domain. Our success at this is thus a function of both our knowledge of the source domain and the set of mappings we establish. If the mapping is extensive enough and the source domain familiar enough we may even be able to infer potential insights in the target domain[2]. The popular use of metaphors to discuss technology delivery is lacking on both counts.

To illustrate the problem, let's take a look at a really popular but deeply flawed metaphor: Technical Debt. Here, the presumably familiar source domain is finance and the target domain is software process.

First, if you haven't done so already, take a few minutes to watch Ward Cunningham's video where he describes the origins of the metaphor. To be clear: I see nothing wrong with Ward's use of this metaphor. It's how the community has subsequently changed and extended its role that is at issue. Here are a few examples:

  1. It's not the same metaphor(!) Ward describes technical debt as a willingness to ship code whose design reflects current (but likely incomplete) understanding in exchange for feedback (debt leverage) then revisiting the design to incorporate information as it becomes available (repaying the debt). Popular use of the metaphor describes a willingness to leave out known information in order to ship faster (incurring debt), then having to work around these issues until such time as the delivery team is able to correct them (paying down the debt). I.e. Different elements in the target domain are mapped to similar elements in the source, creating an incoherent - even dysfunctional - model (i.e. one item in the source maps to two in the target).
  2. The goals are different. Ward formulated the metaphor to assist non-technical interests in his project who had a deep knowledge of the source domain (i.e. finance) to understand why it was OK - even necessary - for his team to refactor (i.e. to keep their design aligned with what they now understood to be 'best'). Popular use of the metaphor is mostly among technical folk, shows little to no understanding of the source domain and is chiefly used to rationalize, pardon or condemn the practice of deliberately introducing inferior designs for expediency or political reasons.[3]
  3. Ward's focus is on the set of mappings, not the name. The situation is reversed in popular usage where delivery pundits throw the term around with little or no explanation of which mappings are being considered and why.
  4. The mappings in both cases are not deep enough to support analogical reasoning. However, Ward and his financial stakeholders were not trying discover new insights about technology delivery. He was using metaphor to explain something well-understood by the technical stakeholders (that the design must be kept up to date) using concepts familiar to his audience. This is a valid use of metaphor. Popular use of the metaphor however is to make - and argue about - knowledge claims among members of the technical community. This is not valid.[4]

If we are to make any progress in our shared understanding of technology delivery, we will need to stop extending metaphors as the community has done with 'Technical Debt' and replace their use with shared models that are grounded in the techniques of the wider scientific community, such as that of systems theory, or the uses described in Gentner/Jeziorski[2]. In the meantime, new shallow metaphors and new names for old metaphors will only serve to increase the wobble of the already tottering tower of babel that is our current foundation for discussing technology delivery.

(1) For more see, George Lakoff & Mark Johnson's book Metaphors We Live By - this is also the same book that Ward refers to in the technical debt video.

(2) For an excellent introduction to how analogy is used in modern science see: Gentner, D & Jeziorski, M: "The Shift from Metaphor to Analogy in Western Science"

(3) As an old-school Extreme Programmer, the difference reminds me of the battles over the two uses of the term refactor (to improve design without changing behavior, vs a politically expedient synonym for 'rewrite').

(4) Nor would this behavior be valid if the metaphor were deeper. A valid use of such a metaphor would be to formulate an hypothesis based on previously established facts in the source domain - which are then explored and tested in the target domain. Trying to imagine Technical Debt as this sort of research tool is almost laughably absurd yet that hasn't stopped people from using it as the basis for all sorts of knowledge claims about what is necessary for successful technology delivery. No wonder SW methodology claims are accused of being little more than religious snake oil.

We Need to Retire Productivity

Tim Ottinger recently proposed that (software) productivity is the ratio of "What you did" over "What I wanted." In the context of Tim's post it's a reasonable formulation and underscores the (IMO) valid point he is trying to make that it's really just expectation management.

However, even this loose ratio is too rigorous for my taste. IMO, productivity of all of technology delivery is nothing more than a bad[1] metaphor for expectation management and it should be retired.

And while we're at it, we should kick "expectation management" to the curb too - the literature is full of hand-wavy advice that rounds down to: "You'll know it when you see it."

Instead, I prefer an analytical tool that's up for the job of modeling complex interactions, namely: systems theory. Instead of torturing the manufacturing metaphor or applying yet another "management technique", we can explicitly state our assumption that the problem we are trying to model is (say) one of input stabilization via feedback. Continuing with this example, "High productivity" becomes: "The number of (feedback) iterations to stabilize demand[2] signals were less than expected" and "Low productivity" becomes: "The number of iterations were greater than expected." And so forth.

The advantages over the broken productivity metaphor for those interested in understanding such interactions are many:

In short: Productivity is just one of many bad metaphors in the tech biz that needs to go. We have better tools at our disposal and it's time those of us who wish to share knowledge about technology delivery start using them both to communicate our ideas and to evaluate the claims of others.

(1) Bad because there are no meaningful units, as Tim (indirectly) points out. For a more direct take on this see Ben Rady's summary of my thinking on this

(2) i.e. a particular type of input signal in certain political systems models; for more see: David Easton's model

Being Scientific

Someone is wrong on the internet. I recently stumbled on the following comment from a scientific authority named 'chill_factor': "atomic theory is extremely well accepted and not up for debate in any way". In itself, the comment is not worth quoting, but given the nature of the question (what is the relationship between Chemistry and Physics) on which Chill is acting the authority, it serves as a jarring reminder of how even those who profess to accept science seem to fundamentally misunderstand what it is.

Chill goes on to defend vehemently - even arrogantly - the idea that well-tested theories are facts though you can "debate them if you want."

This statment shows that Chill is but a devotee of the religion of science. Being scientific is not one's skill in debate or knowledge of "atomic theory" or "the theory of gravity". This is not what defines the scientists who established these theories. These are skills of the student of Science. The follower of Science. The acolyte of Science. The believer of Science. Chill might do well on Jeopardy, but he'd make a lousy scientist.

We hear a lot about scientific method, but many who profess to know it, don't seem to recognize the fundamental premise of it's modern formulation: Claims to truth are never verifiable, but they are refutable. In logic terms, they confuse Modus Ponens with Modus Tollens, but worse they miss that logic's role in modern science is to destroy theories (by means of empirical tests that falisfy them) and even this approach to knowledge is only defensible if we adopt certain methodological constraints (like disallowing ad-hoc hypotheses that retrofit theories to reality).

It wouldn't be so bad if Chill was the exception - but this seems to be the common position of those who are "pro science." It's as if these folks skipped all of 20th centery science except for the facts. As Karl Popper noted:

"If you insist on strict proof (or strict disproof) in the empirical sciences, you will never benefit from experience, and never learn from it how wrong you are. If therefore we characterize empirical science merely by the formal or logical structure of its statements, we shall not be able to exclude from it that prevalent form of metaphysics which results from elevating an obsolete scientific theory into an incontrovertible truth." - Popper, Karl The Logic of Scientific Discovery. (Kindle Locations 695-698)

We retain the quantum/cloud theory of atomic structure, not because it's "true beyond all doubt" but because we haven't falsified it yet. The possibility we may never do so, is not the same as removing doubt about it's veracity. That doubt is built in to the very methods actual scientists use to uphold theory. Ironically, it is failure to understand this in the opposite direction (an overwillingness to reject scientific theory without following method) that characterizes those who are anti-science. The Chills and Anti-Chills of the world are both practicing the same religion - they stare at each other in the mirror and argue which of them is waving their left hand, while never questioning why we call it the left one in the first place.

In short: certainty in science is about saying what isn't true. It should never be used to denote what is true.

We succeed by failing

The essence of software (any) testing is that while we cannot prove our code correct, we can disprove that it works. Our aim is thus to falsify our code; to try every means at our disposal to get our code to behave in a way other than we specify that it should.

Successful coding is thus a failure to find any further way to falsify the theory our code models. And thus programmers succeed by failing.

Requiem for a Team

Change the members of a team, you change its context. It's an act of violence and (possibly) of creation. This has become a core philosophy of my approach to SW team building over the last few years. This was put very nicely by one Richard Dalton in a recent tweet: "Teams are immutable. Every time someone leaves, or joins, you have a new team, not a changed team."

Now, I'm not a big fan of ceremonies (e.g: we currently have no recurring meetings; our last one made it a year,but had to be held in a bar to get anyone to go), but Richard's tweet got me thinking it could be interesting to add one: When a new person joins or leaves, the first/last act of the group should not be a welcome/send-off for the person, but an explicit recognition that team(s) are ending or beginning. Get everyone together and take a few minutes to reflect on the accomplishments of the outgoing team and talk about the elements of the old context that the new team must take up. Give the person leaving a chance to see that team will forever be identified with them. Allow the new person a chance to learn about the team being disbanded because a new team is forming; and that they are a founding member.

A lot of pomp and circumstance? Mebbe, but in a context that has very little, it would be that more powerful at making it clear: The game resets with every player change.

Tests criticize not verify

Back when I started hanging out on the XP mailing list (late 1999) there were regular discussions about 'adversarial testing' (the idea that testing need be done by an independent role/person). This debate seems to have eased - or at least I don't hear it much any more - but the following Popper quote reminded me of it:

"I do not believe, therefore, that the question which epistemology must ask is, '... on what does our knowledge rest?[...] In my view, what epistemology has to ask is, rather: how do we test scientific statements by their deductive consequences?" ~ Popper, Karl. The Logic of Scientific Discovery (Kindle Locations 1651-1652,1654-1655).

In 'The Logic of Scientific Discovery' Popper is seeking a solid basis for knowledge; to establish a line of demarcation between what is (scientifically) knowable and what is beyond knowable (i.e. metaphysical). His answer is embodied in the now-mainstream doctrine of 'falsifiability'; the idea that scientific methodology cannot be based on inductive approaches. His Basic (Observational) Statements bear a strong resemblance to TDD-style tests, they provide similar bounding on what we know about the correctness of our code.

And since software lies within the scope of Popper's demaraction (see previous note and the link above), it is subject to falsification via deductive manipulation. Thus, the aim of our testing should be to criticize our code; to attempt every conceivable means of falsifying our premise that the code solves the problem it was written for. Programmers - like Popper's scientists - need to treat their code as a theory to be dismantled, not a creation to be defended. The key insight of the XP crowd back in the day was that delegating this to an outside role (tester) tends to reinforce the desire to defend one's solution; and taking responsibility for the quality of our results tends to do the opposite. This was (and is) a key part of the larger XP theory: That the cost of change curve can be flattened. One does so, not by building grand edifaces, but by bashing on one's assumptions and being ever ready to change one's mind (and code) as our knowledge grows...

But only if a critical attitude is adopted. No magic bullets. TDD doesn't 'make' the programmer responsible. Or open to change. As a structured approach to applying criticism to our code it's intended role in the "Theory of XP" is to encourage change. As with Falsificationism itself, the onus is on the researcher/programmer to adopt this mindset of creating "code/theories that suck less."

Fundamental premise of Software Engineering

I have moments of doubt - like when bugs are particularly intractable, or libraries particularly opaque - but ultimately my faith is always restored: _Everything in computing has an explanation. That's not necessarily true of the real world.

Not only do we not know if the real world is internally consistant, the skeptical tradition makes a strong case that we can't know such things. But in the world of computing, we can make this assumption. That's not to say that we can solve all our problems, or reproduce every actual bug. What this means is that in principle - given enough time, money and information - we could do so. That may very well not be true of the world beyond computing. In epistomological terms: There is no problem of induction - nothing metaphysical - in computing.

I suspect this is why we still talk of classical scientific claims to 'truth' when it comes to computing knowledge. It's the positive(?) side of Turing's famous paper: an extant piece of software contains no metaphysical knowledge. It's why there's no magic (or it's all magic). It's why testing works. It's why bugs can be fixed. It's why we sometimes scratch our head when "technical" colleagues hand-wave answers: Our biggest advantage over every other technical discipline (even the hardware folks) is that our world is knowable, finite & truthy. This premise underlies every other thing I've learned about computing.

It's also what I think holds us back from gaining more knowledge about computing practice - our worldview is dominated by positivist approaches to knowledge, and this mode of thinking fails miserably when attempting to cross the line of demarcation between the computable and the real worlds.

Writing tests is like weaving a net...

"Theories are nets cast to catch what we call 'the world': to rationalize, to explain, and to master it. We endeavour to make the mesh ever finer and finer." ~ Karl Popper: The Logic of Scientific Discovery (Kindle Locations 835-836)

This metaphor has long been a favorite of mine for describing the goal of TDD. So, connecting the triangle: "Tests are theories about the behavior of our code?" That's nice and pithy (would make great sales fodder, especially for those aiming to persuade customers they are "scientific"), but I don't think that's quite right. Better is: "Tests embody our theories." But there is no need to torture the metaphor - it's already aligned...

In a footnote to the above, Popper adds:

"the theorist is interested in explanation as such, that is to say, in testable explanatory theories: applications and predictions interest him only for theoretical reasons- because they may be used as tests of theories." ~ Ibid. (Kindle Location 1122)

...and now we can complete the triangle: software "tests" are not analogous to "theories" - they are our means of illiciting predictions about our theories. In this, we have an enormous advantage, in that we can automate the application of our theories via automated testing tools (hence their value and the power of practices such as TDD), while prediction is the goal of the test code itself. Our theories are often not expressed at all; they simply live in our context. Or, put another way: The theory is implicit in the software we write to solve the problem before us; our tests express our predictions of that theory and our testing tools apply it.

Thus, our ability to form and falsify our theories is rendered explcit in our testing.