A Lannister Always Pays His Technical Debts

Software

By: Andrei Taranchenko (LinkedIn)

Created: 08 Feb 2024

A tale of two rewrites

Jamie Zawinski is kind of a tech legend. He came up with the name “Mozilla”, invented that whole thing where you can send HTML in emails, and more. In his harrowing work diary of how the Mosaic web browser came to be, Jamie described the burnout rodeo that was the development of what was later going to become Netscape (the top disclaimer has its own history — ignore it):

I slept at work again last night; two and a half hours curled up in a quilt underneath my desk, from 11am to 1:30pm or so. That was when I woke up with a start, realizing that I was late for a meeting we were scheduled to have to argue about colormaps and dithering, and how we should deal with all the nefarious 8-bit color management issues. But it was no big deal, we just had the meeting later. It’s hard for someone to hold it against you when you miss a meeting because you’ve been at work so long that you’ve passed out from exhaustion.

Netscape’s wild ride is well-depicted in the dramatized Discovery miniseries Valley of the Boom, and the company eventually collapsed with the death march rewrite of what seemed to be just seriously unmaintainable code. It was the subject of one of the more famous articles by ex-Microsoft engineer-turned-entrepreneur Joel Spolsky - Things You Should Never Do.

There have been big, successful rewrites. Twitter moved away from Ruby-on-Rails to JVM over a decade ago but the first, year-long full rewrite effort completely failed. Following architecture by fiat from the top, the engineering team said nothing, speaking out only days before the launch. The whole thing would crash out of the gate, they claimed, so Twitter had to go back to the drawing board and rewrite again.

The road to hell is paved with TODO comments

All of this is to say that you should probably never let your system rot so badly until a code rewrite is even in the cards. It never just happens. Your code doesn’t just become unmaintainable overnight. It gets there by the constant cutting of corners, hard-coding things, and crop-dusting your work with long-forgotten //FIXME comments. Fix who?

We used to call it technical debt - a term that is now being frowned upon. The concept of “technical debt” got popular around the time when we were getting obsessed with “proh-cess” and Agile, as we got tired of death march projects, arbitrary deadlines, and general lack of structure and visibility into our work. Every software project felt like a tour — you came up for air and then went back into the 💩 for months.

Agile meant that the stakeholders could be present in our planning meetings. We had to explain to them - somehow - that it took time to upgrade the web framework from v1 to v5 because no one has been using v1 for years, and in general, it slowed everyone down. Since we didn’t know how to explain this to a non-coder, someone came up with the condescending term “technical debt” — “those spreadsheet monkeys wouldn’t understand what we do here!”

While the term has most likely run its course as a manipulative verbal device, it is absolutely the right term to use amongst ourselves to reason about risks and to properly triage them.

The three type of technical debt

The word “debt” has negative connotations for sure, but just like with actual monetary debt, it’s never great but not always horrible. To mutilate the famous saying - you have to spend code to make code. I would categorize technical debt into three types — Aesthetic, Deferrable, and Toxic. A mark of a good engineer is knowing when to create technical debt, what kind of debt, and when to repay it.

Aesthetic debt

This is the kind of stuff that triggers your OCD but does not really affect your users or your velocity in any way. Maybe the import sort order is an eyesore, or maybe there is a naming convention that is grinding your gears. It’s something that can be addressed with relatively low effort when you are good and ready, in many cases with proper automated code analysis and tools.

Deferrable debt

Deferrable debt is what should be refactored at some point, but it’s fairly contained and will not be a problem in the immediate future. The kind of debt that you need to minimize by methodically crossing it off your list, and as long as it seeps through into your sprint work, you can probably avoid a scenario where it all gets out of control.

Sometimes this sort of thing is really contained - a lone hacky file, written in the Mesozoic Era by a sleep-deprived Jamie Zawinski because someone was breathing down his neck. No one really understands what the code does, but it’s been humming along for the last 7 years, so why take your chances by waking the sleeping dragons? Slap the Safety Pig on it, claim a victory, and go shake down a vending machine.

Toxic debt

This is the kind of debt that needs to be addressed before it’s too late. How do you identify “toxic” debt? It’s that thing that you did half-way and now it’s become a workaround magnet. “We have to do it like this now until we fix it - someday”. The workarounds then become the foundation of new features, creating new and exciting debugging side quests. The future work required grows bigger with every new feature and a line of code. This is the toxic debt.

Lack of tests is toxic debt

Not having automated tests, or insufficient testing of critical paths, is tech debt in its own right. The more untested code you are adding, the more miserable your life is going to get over time. Tests are important to fight the debt itself. It’s much easier to take a sledgehammer to your codebase when a solid integration test suite’s got your back. We don’t like it, it’s upfront work that slows us down, but at some point after your Minimal Viable Prototype starts running away from you, you need to switch into Test Mode and tie it all down — before things get really nasty.

Lack of documentation is toxic debt

I am not talking about a War & Peace sized manual or detailed and severely out of date architecture diagrams in your Google Docs. Just a set of critical READMEs and runbooks on how to start the system locally and perform basic tasks. What variables and secrets do I need? What else do I need installed? If there is a bug report, how do I configure my local environment to reproduce it, and so on.

The time taken to reverse-engineer a system every time has an actual dollar value attached to it, plus the opportunity cost of not doing useful work.

Put. It. In. A. Card.

I love creating TODOs. They are easy to add without breaking the flow, and they are configured in my IDE to be bright and loud. It’s a TODO — I will do it someday. During the Annual TODO Week, obviously. Let’s be frank — marking items as “TODO” is saying to yourself that you should really do this thing, but probably never will.

This is relevant because TODO items can represent any level of technical debt described above, and so you should really make these actual stories on your Kanban/Agile boards.

Mark technical debt as such

You should be able to easily scan your “debt stories” and figure out which ones have payment due. This can be either a tag in your issue-tracking system or a column in your Kanban-style board like Trello. An approach like this will let you gauge better the ratio of new feature stories vs the growing technical debt. Your debt column will never be empty — that goal is as futile as Zero Inbox, but it should never grow out of control either.

// TODO: conclusion