Technical Debt vs Technical Waste

Wednesday, November 13, 2019

15 minute read

Executive Summary In trying to improve productivity and reduce waste there are 3 important concepts. These often get confused, which makes it difficult to align and fix the problem. The key concepts are:

Technical Debt — a way your system has not yet incorporated new information.
Technical Waste — technical friction that does one of 3 things:
- puts existing business at risk, or
- causes delays and increases costs to respond to new information when it appears (aka, manage technical debt), or
- removes business options.
Hazard — a source of technical waste, often a risk of creating waste rather than a certainty.

This article will explore why each matters, how they interact and reinforce each other, and how to implement a clear strategy to address them in your organization.

Technical Debt

Rather than write this myself, I’ll quote the people who coined the term. This is Ron Jeffries, who was an the team that originated “technical debt:”

I believe that the term “technical debt” originated with Ward Cunningham, who described how a system grows in complexity as it grows in capability.
Even if we keep the design as clean as we can manage, our understanding of what the design “should be” grows and deviates from what the design is. Ward described how, sometimes, when we put that improved understanding into the actual design, the system improves.
There should be no surprise here: we’ve mostly all seen systems with designs that don’t support their requirements well, and I hope most of us have seen systems whose design does a great job of supporting the system capabilities.
The difference between what the design actually is, and what it could be now that we know more, is what Ward called technical debt. (Assuming, of course, that I understood him.)
So let’s work with that notion. Consider a design that is “just right” for the system’s current capabilities, versus one that isn’t. Let me give an example. Suppose we created a budgeting or similar financial system, and suppose we represented dollars and cents as floats.
This will work pretty well for a while. Once in a while we have to do some special rounding or bend over backward to get something to work. As the system gets more and more capable, those adjustments become more and more troublesome.
Sooner or later, we realize that it would have been better to have used an exact monetary type that could represent dollars and cents. We might decide that basing that type on a long integer would have been better than using float. (There would still be interesting adjustments needed with an integer-based type, but there’d be fewer. Just go with me here.)
The difference between our newly-imagined design and our existing one is technical debt.

This is the critical element: Technical Debt arises because of new information. As such, the only way to prevent it would be to prevent any learning from happening during the project.

The key, then, is to be able to incorporate new insights and information into the design as quickly as possible. That allows us to continually manage our technical debt and keep the design as up-to-date with new information as possible.

Like financial debt, technical debt that has not been addressed has a carrying cost (interest payments). We didn’t know we were paying these costs prior to gaining the new information. We can afford to continuing paying the costs, after all we have been paying them. However, we now have the opportunity to make an investment to reduce our carrying costs.

Technical Debt represents an opportunity to improve productivity.

Of course, the competition is improving their productivity at an average rate of 30% per year, for a doubling time of about 2.5 years. So opportunities we choose to pass up will place us at a competitive disadvantage over time. The key is to decide which opportunities have sufficient ROI to justify investment in a competitive landscape.

When Should We Pay Technical Debt?

Ron continues:

Every feature we build takes somewhat longer to do, because of the care we have to take with the pennies. Worse yet, often a new feature will require us to make adjustments elsewhere, perhaps adding in a bit of lost round off or something.
If we had our integer type, all our new features would go faster. We want to reduce that “technical debt”. We argue for switching to a new integer-based data type. Would it be worth it? It’s very hard to say.
There are at least two unanswerable questions here: 1. How long will it take to convert to integers? (And how many bugs will it create?) 2. How much faster will we go after this is done?
Now, experience tells me a little something about this: If we have at least encapsulated our floats into a class or type, “Money”, it’ll be easier than if we’ve just been passing pure floats around and using them as money or percentages or other things.
In the latter case we have a huge detection task ahead of us. In the former, we can do most of the work inside the Money class. The conversion is more easily justified by encapsulation. The “technical debt” is less, because we have a generally better design, even flawed.
This example teaches two lessons, in my view. First, we are very likely, as time goes on, to see better ways to have done our work. Second, if we keep the design we have as clean as possible, adjusting to those better ways will be easier.

The second lesson is an example of what we term a Hazard. Using bare primitives (like floats and integers) to represent a real-world concept (like money) is an example of Primitive Obsession — one of the most common Hazards.

This Hazard generating Technical Waste: a friction that makes it harder to incorporate our new information. The Primitive Obsession causes the float type to be duplicated all over our system. There is duplicate code to handle the adjustments. Even worse, the type itself is duplicated everywhere, and not all of those copies are about money. Floats are also used for other things, so changing just the right ones is way more expensive than it would be if the float type was just in one place — inside the Money class.

In this case,

The Technical Debt is the realization that using an integer instead of a float would improve future productivity.
The Technical Waste is the high cost of applying our realization due to rampant duplication of the float type, some of which is money.
One of the Hazards is Primitive Obsession which drives the duplication.

Technical Debt is a cheap opportunity to improve productivity, unless there is a lot of friction — Technical Waste. And Technical Waste is caused by Hazards, which we can fix.

Hazards

Hazards, such as Primitive Obsession, can be spotted in advance and rectified without needing new information. All code smells are Hazards, but they are not the only ones. For example, not knowing about the concept of code smells or being familiar with them is also a Hazard, as is not knowing how to safely refactor to fix them. This lack of knowledge and skill increases the probability of having code smells and the cost of fixing the ones you have.

Often Hazards will mask other Hazards, such as a team not knowing about the Primitive Obsession code smell masking that their code suffers from it. In this case we can find the Hazards by noticing the Technical Waste and tracking it back to its many sources. By making partial improvements on the Hazards we can see, we will notice if there are Hazards we cannot see. This practice, called Safeguarding, leads us to incrementally discover and address the Hazards that cause us the most Technical Waste.

We will get into this more in our section about how Hazards interact. The main concept to get right now is that the use of floats instead of integers is a Technical Debt. We got new information and want to incorporate it. Any increase to the cost to make that change is Technical Waste, and at the root of any Technical Waste are a set of Hazards.

Technical Waste

Now Ron turns to discussing the concept of Technical Waste:

Now you know, and I know, never to use a float for money. But there is something we don’t know now about the next system we build, and someday that thing will be as obvious to us as “NEVER USE FLOATS FOR MONEY” is today.
So the second lesson — keep the design as clean as possible — is critical. With a clean design, design improvements are generally easy. Modularity works. The more messy our existing design, the less likely we are to be able to improve it even after we learn how.
Unfortunately, today, “technical debt” has often come to mean something like “judiciously skimp on design today, so that we can go faster; we’ll clean it up later.”
In my view, there is no useful meaning for “judiciously skimp on design today”. Every design flaw we leave in the system today will slow us down tomorrow. I don’t mean “in the future.” I mean, literally, tomorrow.
Our design will inevitably deviate from the best design we can imagine, because we’re learning as we go. Our chances of moving into that better design are much greater if the design we have is as clean as we can make it.
On the other hand, a “quick and dirty” design is mostly dirty and rarely quick. Mind you, that doesn’t mean we can’t just use a simple solution now, and put in a more robust one later. We can do that: but we need to put the simple one in cleanly.
Arguably, our simple solution should be put in more cleanly, because it is more likely to need to be taken out.
To go fast, we have to go clean. Dirty code isn’t technical debt. The surgeon can’t usefully save time by only sterilizing some instruments. We can’t usefully save time by writing sloppy code. That’s not technical debt. That’s sabotage.

What Ron refers to as sabotage is when people intentionally create Hazards and thus incur ongoing Technical Waste for the organization. This does happen, but almost always because another set of Hazards are in play. So the first question I ask when I see someone create a Hazard is

What Technical Waste caused a well-meaning, intelligent, careful individual to realize that introducing this new Hazard was the best available option?
Me.

To answer this question, we will need to explore Technical Waste and how it interacts with Hazards.

How Technical Waste Grows

The biggest problem with Technical Waste is that it changes the economics of future actions. In fact, that is basically its definition. Generally, Technical Waste increases the cost and / or risk of “doing the right thing.” There are 3 kinds of Technical Waste:

Things that put existing business at risk, such as bugs that prevent users from using existing features.
Things that cause delays and increase costs to respond to new information when it appears (aka, manage technical debt), such as duplication.
Things that remove business options, such as manual deployment preventing auto-scaling cloud deployments.

The first and third of these create problems for the business as a whole. They are externalized costs and risks which the business has to mitigate. They often cause downstream costs in other departments, and can even block the business from critical strategic shifts.

The second kind is arguably worse, however. That’s because these Wastes compound themselves by creating new Hazards.

Hazards Beget Hazards

Wastes that increase the cost of changing code in response to new information increase the cost of code change in general. A new feature request is new information, and will run into the same friction. When change is needed, there are always several options. The minimal change is to try to bend the existing code to do something new, which generally creates a Hazard.

Thus, wastes that increase the cost of change also increase the cost of non-Hazard-creating decisions. Thus developers choose more often to create the Hazard. The new Hazard then creates more Technical Waste — of all three kinds — and the cycle continues.

Each Hazard supports other Hazards, creating a self-adaptive system that maintains them all. One example is code that is hard to test. When tests are the way developers know establish safety while refactoring, this forms a cycle. They can’t get the code under test without changing the design. They can’t change the design safely without refactoring. They can’t refactor safely until the code is under test. At this point, there is a stable system that reinforces the Hazards and allows them to spread.

Once this has gotten past a critical point — which happens more quickly than most people realize — the developers no longer have the ability to address the Hazards in their system.

Not all is lost, however. Many of these code-based Hazard cycles are supported by non-code Hazards, and those can be fixed.

Non-Code Hazards

In the example that I stated above, one key part of the chain was “when tests are the way developers know establish safety while refactoring.” However, this is not the only way to establish safety while Refactoring. Disciplined Refactoring is another option, and Code by Refactoring shows how to integrate it into the team’s daily work.

This is commonly the case. Code Hazards reinforce themselves within a given context. But there are other significant behavioral motivators in play on developers; change them and you can alter the decisions developers make. Suddenly it becomes possible to resolve some Hazard cycles, which them makes it possible to resolve others. Over time, the same developers who stumbled into the Technical Waste are able to work their way out of it.

These key behavioral motivators are:

Social norms
Tools and processes (especially mental tools)
Some key skills (especially disciplined refactoring)

One example is three motivators that often come together:

Working individually on tasks (a process)
Implementing features as a team / we succeed or fail together (a process & a norm)
“I will not let my team down” (a norm)

The result of individual work is that with every feature there will always be someone who is getting done last. That person will feel intense, internally-generated pressure to cut corners and create code Hazards, because they don’t want to let the team down. This pressure is generally even stronger than an explicit management or team mandate to take time and do the right thing. It is a non-code Hazard. Every feature this team ships will cause someone to write new code Hazards, which will be blamed on the individual.

Hazards Form a Complex Adaptive System

The above is a system result. Any person will make exactly the same choice, even if you replace the individual who made the most code Hazards. To get a different outcome, you need to change one of the 3 motivators.

This, by the way, is one of the many reasons that teams which mob or pair get such great results. They changed the first process, which broke the reinforcing set of influencers. That eliminates the non-code Hazard, and so those teams generate fewer code Hazards. Thus they have lower Technical Waste, which lowers the cost to respond to new information and they have less Technical Debt.

The skill lies in identifying the keystone Hazards, whether it is code or non-code. We need a new strategy for identifying and fixing keystone Hazards.

A Strategy to Reduce Hazards and Technical Waste

The most important part of your strategy is the realization that your set of reinforcing Hazards create a complex adaptive system. Any change will have non-local, non-obvious effects that tend to minimize the result of that change. Any particular strategy will thus have a short-term effect and a long-term effect that reverses that gain.

To succeed, you need to continually listen to the system and adapt your strategy quickly.

Many leaders will attempt to create a plan and align around it. Unfortunately, while plans and good decisions work for complicated problems, they fail in complex systems. The only way to operate successfully in the complex domain is by rapidly probing and responding and then probing again, expecting to execute a different response.

Additionally, the same action will have opposite effects at different parts of the system. So you will need different strategies in each team and each part of the code, which will sometimes conflict.

Improve Hazards that Block Adapting

Look first at the Hazards that make it expensive to listen and to adjust your strategy, or to pursue conflicting strategies in parallel. Most companies are rife with these, and require significant leadership style and structural changes to address these Hazards.

There are well-established practices here, such as cross-functional teams focused on market segments, intentional local decision-making, decentralized strategy, local operational control, data-driven decision-making, and intent based leadership. All of these are difficult changes to make, but are effective.

Also Improve Hazards that Block Team Options

In parallel you can look at Hazards that block options for teams. Each team will experience different Hazards, but again there are some keystone Hazards with well-established solutions. Here the key practices are non-individual work (pair or mob), safeguarding, disciplined refactoring, naming as a process, and micro-testing.

With each improvement in those practices and the Hazards they address, new Hazards will become visible and addressable. The organization has become a complex self-adapting system whose purpose is to continually address Hazards.

Now Each Team Addresses Keystone Code Hazards

At this point each team will be able to address some of the Hazards that cause Technical Waste for them or adjacent teams. Some of these will be creating large amounts of Waste for the rest of the organization, but others will generate very little external Waste. Either way, the Technical Waste generated by these first Hazards is supporting other Hazards. The first addressable Hazards are the keystone Hazards for the code.

As the teams address the Hazards they can, new Hazards will become solvable. An incremental strategy to reduce those Hazards will give ongoing returns in reduced Technical Waste of all 3 kinds. Using Safeguarding plus Read by Refactoring provides an optimal investment strategy to guide those reductions.

Which Will Naturally Address Technical Debt

As long as the organization is able to stay the course and continue supporting its own efforts, Hazards will go down over time, which will naturally reduce Technical Waste. Reducing Waste changes the economics, and then teams will start to incorporate new information more frequently. That reduces the amount of Technical Debt held at any one time.

But the first key step is to realize these are different concepts, and make sure you apply your efforts in the right place. Find and fix your keystone Hazards.