How many times do you think a developer or development team has looked at some legacy or technical debt-laden software and exclaimed "this bollocks needs a complete rewrite!"? How many times do you think that same "complete rewrite" has led to disaster taking three, four, or more multiples of time than originally anticipated? I'd bet money on most. I've seen it happen. I've been in and led teams where it's happened. It sucks for everyone involved.

Choosing to rewrite an application will inevitably feel like good progress to start with (at least after you've justified and got approval to rewrite "perfectly fine software" from those you need approval from). The original pain points that caused your egregious software development rage will be the first to go and everyone will rejoice. "Ding, dong, the witch is dead" they'll sing. "The bloody spaghetti monster is gone" the team will cheer.

As time goes on, unless you're lucky enough to work for yourself (in which case, do what you want) or have an unlimited budget with no commitments, commercial leaders will grow tired of waiting. Pressure will build, corners will be cut, and mistakes: similar to those plaguing the original codebase you fought so gallantly to rewrite, will work their ugly facades into your source control repository once more.

When you finally deliver many moons later (in a big bang release as you decided to just "do it all at once" as "incremental releases are just too hard/risky") your expectations of jovial singing customers and development colleagues are instead met with an inevitable sleuth of technical debt that needs resolving, bugs you've introduced (by "fixing" previous issues) and the "oh shit" feeling that technical debt is an inevitability of all software worked on by multiple people that rewrites only temporarily hide.

It doesn't have to be this way.

What if I told you that you could reap all of the supposed benefits a rewrite gives you just by refactoring an existing codebase? What if I told you that by planning your refactor carefully you'd inevitably end up with a better understanding of why the original choices were made, how the original software is composed and how best to move forward with it?

I imagine you'd think I'm mad, a servant to the bourgeois to protect their money or someone who hasn't seen proper spaghetti code before. I might be a bit of the first, you're right, but I've rewritten spaghetti monster codebases from scratch, suffered the consequences, and found alternatives I'd like to share with you.

It all takes the form of seven relatively easy (in theory) steps.

A fair warning though: this takes discipline across your entire software team. Each team member is responsible for ensuring they're always moving the code base forward, never back, and communicating the problems they face and solve with the wider team.

It also requires a significant amount of stakeholder management (if you have stakeholders) in order to get and maintain buy-in. This is likely the most difficult part for someone who prefers writing code to playing corporate politics but it's also something that is different in every organization and not something that's overly enjoyable to write about.

My only tip for this article (and corporate life in general) is to remember stakeholders (generally) come from a position of ignorance and a lack of understanding rather than maliciousness. Take the time to explain to them in business terms WHY a refactor needs to happen, HOW it's better than not doing anything, and HOW you're taking the economically savvy approach and mitigating risk and you'll find you get a much better result than immediately going on the defensive and chanting "rewrite, rewrite, rewrite" over and over like some sort of possessed, speech-enabled crab 🦀.

Step 1: Identify how you ended up shit creek

For some reason(s), your software has ended up shit creek without a paddle. You need to take a look back over the history of your codebase and identify how and why you're in the situation you're in.

It could be a lack of leadership; it could be a lack of architectural standards or best practices; it could even be external pressures. Regardless of what it is, it's important to figure out why the codebase has ended up the way it has done before attempting to change it.

Fail to do this your history is likely doomed to repeat itself.

Step 2: Tests, tests, glorious tests

Before you can refactor a single, tiny, minuscule smidgen of code you need absolute confidence that the changes you make are not going to have any adverse effects on your product and customers.

To accomplish this, you need an extensive and comprehensive test suite covering every use case and corner of the software you're planning on refactoring.

These will preferably be automated tests; but repeatable, manual tests work if they're all you can muster up (though they require a significant amount more effort than their robot ran counterparts).

The type of tests you have should cover as much of the software as they can in an implementation-detail-agnostic way as possible (so end-to-end or integration tests).

I've written an article on why tests which rely on implementation details aren't great previously: https://liamsymonds.com/your-unit-tests-arent-as-great-as-you-think/

Fail to do this and you're bound to break an important part of the software you're refactoring. This'll cause customers to complain, money to be lost, C-levels to kick a puppy in the face for every hour the problem exists in the wild, and will cause the budget allocated to your refactor to miraculously drop to £0 overnight.

Step 3: Architectural decisions

Before refactoring anything you, as a software team, need to make your initial architectural decisions for how you want to produce software (not just in terms of raw code) moving forward.

It could be that a domain-driven approach would work for you, it could be that you need microservices (doubt.jpeg.exe), it could even be that the framework you're using isn't the right one (mega_doubt.jpeg.exe).

Ensure that every option is evaluated against the impact it'll have on your team and your software product rather than how religiously it's adhered to or spouted as the second coming of Christ by dogmatic software zealots.

Fail to do this and you're just going to implement processes and procedures that do not suit your business, software, or your software team at all and will end up back in spaghetti land before you know it.

Step 4: Plan a spaghetti-monster destroying roadmap

You then need to plan how you're going to gradually convert your old, legacy code into something you don't cringe at. This is the point you're going to need complete commercial buy-in from your business, so make sure you consider contractual obligations and business expectations whilst formulating your roadmap.

You have many options at this point:

  • You can tackle your biggest pain points first.
  • You can tie in your refactoring with the features you're currently working on.
  • You can pick areas out of a hat.
  • You can look to the stars to see if they have any guidance (I've tried - they don't).

Fail to do this and you'll struggle to understand where you've been and where you're going and will have a very difficult time answering questions that are bound to come from above regarding time frames and progress.

Step 5: Make clear barriers

The easiest way to see the progress of your refactor is to make physical barriers between old, unrefactored code and new, refactored code. Whether that be a separate folder or package, having the clear separation will help enforce your new architectural standards and determine how much code is still left to refactor.

If your language/framework allows you to create compile-time separations of code (i.e. assemblies in .NET), use them. It'll help enforce the "old can reference new but new can't reference old" standard defined in the next step.

Step 6: Slow and steady wins the race

Gradually, according to your roadmap, move your old, legacy code over to your new architectural standards and processes. Refactor dependencies as you go.

Your old code can reference your new code in order to provide backward compatibility in areas that would otherwise remain untouched, but under no circumstances should new code directly reference old.

Where a dependency of refactored code is just too coupled in the rest of the old codebase to refactor safely, carefully consider whether copying and pasting and then refactoring dependency might be a simpler and safer alternative until the legacy codebase/spaghetti monster is further simplified.

If your old codebase lacks automated tests, the period during which you're refactoring is the perfect time to add them.

Step 7: Review, review, review.

You should constantly review your progress as well as your thoughts and concerns on your refactor's progress. You should see where you are in terms of codebase quality with where you thought you'd be and see if there are things you could do better.

It's often said in war that no plan survives contact with the enemy (that's what Hollywood tells me anyway) and similar can be said for software:

No architectural plan survives someone actually implementing it.

Reviewing allows you to make sensible course adjustments and decisions in good time before it's too late.

In Summary

  • Identify any problems that caused the technical debt/unimproved legacy codebase.
  • Tests, and then more tests.
  • Make architectural decisions
  • Plan a roadmap
  • Make clear barriers in your codebase between old and new
  • Gradually move and refactor old code to new according to your roadmap
  • Review continuously

Following these steps allows you to refactor a legacy codebase without performing a complete rewrite. It allows you to continue to deliver value to your customers and business without being out of action for months or years at a time and ending up in exactly the same situation when finished.