Sadly this Legacy Code Works. How Do I Get Permission to Fix It?

Wednesday, August 26, 2020

11 minute read

I got a good legacy code question today in the Code by Refactoring Slack channel.

We have a code module called the state machine which is poorly written and has no tests. We all (we the engineers) agree that we should rewrite the state machine from scratch. However, the current implementation ‘works’, and we are not adding states or other stuff to that module.
Here’s the questions:
a) Do we need to add tests (unit & integration tests) at all? If yes, how can I sell it, such that we get time for that task?
b) Once we have tests, do we need to rewrite the state machine to make it less messy and the code better readable - or leave it like it is until we need to actually add a new state?

Short Answer

I think of everything in terms of risk. Thus I believe the right way to mend this code is:

No, you don’t need to add tests.
Yes, you should leave this ugly code as-is for now.
1. Do establish (now) that you are delaying the cost until the first impacted story.
2. You probably should refactor something now, just not this. Start feature work so that you know what to refactor.
Also, don’t rewrite ever - refactor.

Here’s how risk analysis gets me to those answers.

Quantifying Risk

We have to quantify risk first if we want to assess it. For something like this I see two relevant risks:

Bug Risk: the probability of accidentally adding or fixing a bug, or
Delay Risk: the expected value of the cost of schedule delays (chance of delay * size of delay * cost of delay at the time it happens).

In general, the first risk causes me to delay work while the second causes me to pull work earlier.

The goal is to prevent future bugs

The main advantage of clean code is to prevent future bugs. Secondarily it reduces cost for future stories and reduces frustration when implementing those stories. We attain those benefits any time we change code that requires us to understand this code. That means we gain benefit when we read this code in order to change it or to change code that we think might integrate tightly with it.

We can optimize our ROI by increasing the probability that the investment we make will give us value and by delaying our investment to the moment before we need it. In other words, we want to invest in each code chunk just before we will need to understand it.

When will we need to understand it?

In general, teams tend to work on similar stories at any given time. That causes code changes to cluster. A given chunk of code will see short bursts of intense change followed by long periods of stasis. In fact, code that doesn’t follow this pattern shows a design flaw! If every story requires changing some particular piece of code, then that code needs to be separated (and is likely a God Class).

Since the state machine code just received heavy change and no new states or changes are happening right now, the odds are that this code will not change again for years. Therefore, now is the worst possible time to invest in it - this is the moment with the longest duration until your investment will see a return.

But it’s cheaper to change now!

There are two countering arguments. First, we know this code now. We can replace it more cheaply now than later. Additionally, now there is a low cost of delay - feature progress matters, but not as much as it will later. Thus, this is the best time to make investments.

True, but both of those arguments mean you should go do something else.

Refactoring is best when you are unfamiliar with the code

The reason that clean code reduces cost and reduces bugs is because it makes code easier to read. All code is read far more often than it is changed, because you have to read it any time you might change it or any code that touches it.

But people read code differently than they write it. The biggest difference is that they remember the context and their intention while writing it, but have lost this information when it comes time to read. In fact, that’s the information they are usually trying to find!

Refactoring code is all about making it easy to read, so you want to do the refactoring at a time when you have similar context to the future reader. Thus, refactoring while reading and building up context tends to give a better result than refactoring right after writing. If you want readable code, refactor it the first time you try to read it.

Refactoring is best when it is part of feature development

The one useful piece of context a future reader will have is the story they are working on. You can make code easier to read if you can tie it across that context. This context is not available until that story is defined. That’s why refactoring gives a better result when you delay it until you are working on a story.

Additionally, each story guides you to some part of the code. You don’t need to understand the rest. You get maximum ROI for your refactoring efforts if they pay off value immediately and frequently.

Most code that is written will be touched again rarely or never. Therefore, if you refactor all code as part of writing it, most of your refactorings will pay off rarely or never.

However, stories cluster. So if one story requires you to change or understand some piece of code, then usually several stories around that time will do the same. Even if they don’t, any refactoring you do is guaranteed to pay off at least once - and that far exceeds the average.

Thus, you always get a better return if you refactor as part of story development. Refactor the code you are trying to understand, and then the code you are changing. Don’t bother refactoring the new code you just wrote. And certainly don’t refactor without a story, because now is when you have time.

Rewrite with tests vs refactor

My last recommendation is to rewrite rarely, and never using tests of an existing system.

Basically, from a business perspective, there are 2 cases:

My product generally does what my customers want. I want to extend it but keep all the current behavior.
My product missed the mark. I want to try a different direction, though likely still for the same customers and problem.

And technically, there are two strategies you can apply:

Refactor. Guaranteed (if you follow the process) to not change any behavior - even a behavior you don’t know about. Cannot alter a customer experience. Consistently results in a system that is better-designed than the original.
Rewrite. Can only re-create the behaviors you know about, and it is no more difficult to make a new behavior than to maintain an existing one. May result in a better or worse design than the original, depending on a number of factors. Usually you end up with about the same design quality.

Each costs about the same amount of time.

Refactoring (especially Disciplined Refactoring, which we teach via our Code by Refactoring Foundations Workshop) is the best solution for products that do the right thing. Rewriting is the best solution for products that do the wrong thing.

Great…how do I sell it?

Selling to business people

Business people care about product risk. If your code does the right thing, then any change (including adding a new feature) has some chance of breaking that. Therefore, the expected value of the work is the value of the new functionality * the probability it will deliver that value, minus the probability of breaking something * the sum of the value for all previous features, minus the opportunity cost spent doing the work.

A pure refactoring effort attempts to deliver no new value. Therefore, its expected value to the business is negative.

However, a refactoring when working on a story does deliver value: it improves the probability that the new feature will deliver the target value, it decreases the probability that the feature work will break existing functionality, and it reduces the opportunity cost of the feature work.

Thus, refactoring when doing a feature is always a positive value, as compared to simply doing that feature alone.

Business people like numbers. Pull out this equation, get agreement that it models their world, then show both pure refactoring is negative value, and refactoring during a feature for that feature is positive value.

Selling to other devs

There are two challenges here:

Developer professionalism / pride.
Optimism.

First, developers don’t want to leave a mess for other developers. They take pride in creating clean code. They also want to hit their schedule commitments. The result is a desire to first rush to ensure they will hit schedule, then clean up any messes to leave good results. This, by itself, virtually guarantees they will refactor the wrong things at the wrong time.

Pride says to clean up any messy code you wrote while there is no schedule pressure. ROI says that refactoring the code you just wrote is the least valuable refactoring you could do.
Pride says that when we have feature work to do, we need to meet our responsibilities to others first, then make things right for ourselves. ROI says that you generate the most value if you refactor first and then implement using the refactored code.

Helping devs see this contradiction is very hard. In fact, nearly every team that I’ve seen with technical debt has created that debt as a result of professionalism. There is usually a lot of finger pointing at the business people, saying that they don’t schedule time for code cleanup. That’s true. However, code cleanup projects don’t deliver much value. The thing that does deliver value is refactoring as part of each story - and the devs themselves are consistently making the decision to not refactor as part of each story.

Optimism tends to exacerbate this problem. We devs generally believe the following:

I can create a better design the second time that I create a system.
I didn’t understand the needs & constraints before, but now I do.
I’ll use X now, which will make it better (TDD, pairing, mobbing, new language, new framework, improved skills).

All three are false in general. Design quality doesn’t tend to improve with a second system. In fact, history shows that second systems tend to be over-engineered, even harder to change, and fail to ship entirely more than half the time.

Second, my new understanding of the needs and constraints is based on what I learned the first time. But I’m intending to build the system in a different way, and that will change the constraints. And the fact that customers are now using the first system will change the needs. Thus, my understanding of needs and constraints includes wrong information, and I won’t know what is wrong until I get into it.

Finally, new practices and technologies can improve my system. However, they only tend to give that improvement after the team has spent a while learning when and how to use the new practice or tool. This is rarely the case right after you’ve finished building a thing, so an immediate re-write will generally use the new practice or technology in a similar way as the old and deliver a similar result.

Selling to devs is hard, because you have 5 different myths to bust, and they are all tied into the developer value system. This takes time and other options. The best approach is to do some mobbing. Show the effectiveness of Disciplined Refactoring, then the value of doing that while reading. Help the team develop proficiency. And then raise the question about when to refactor.

Summary

This state machine re-write is a very common case. The team has just finished creating a new module, the module does the right thing, the code sucks to work with, and professional pride is drawing them to make it right - they want to be kind to their future selves.

However, an economic analysis of refactoring states that refactoring something else will deliver more value. The best return will come from starting the next feature and refactoring that code as the first development step. True professionalism is to create a system that maximizes total value of the changes that actually happen. Maximizing ROI requires changing team culture.

Changing the culture requires selling everyone on the value. We can sell the business people on the basis of the economics, as long as we provide them the right option. However, selling the developers on the change is hard, because it requires helping them re-evaluate their assumptions and value system.