Code Reusability vs. Accidental Commonness
One of the nastiest, consensus-critical bugs (CVE-2018–17144) has been recently discovered in the Bitcoin Core software, which prior to that point had an almost immaculate history. Jimmy Song has written an excellent breakdown of this bug.
The short summary of the bug is that there are 4 cases where the Bitcoin Core software needs to check for double-spending. All 4 cases initially shared the same code execution flow. After some subtle iterations of the code over several years, one of the 4 cases (“single-tx-double-spend-in-a-block”) got skipped, which would allow a miner to potentially trick some nodes into accepting a block that inflates the supply of Bitcoin.
The nature of this bug reminds me of the constant conflict between:
(a) the need for code reusability & optimization
(b) the danger of falling for what I call accidental commonness: things that are similar not by design, but by accident
Accidental commonness creates a fertile ground for refactoring nightmares and potential bugs like CVE-2018–17144.
Some background, if you’re not familiar with software engineering:
In software there is this grand vision of software components being perfectly modular — similar to their physical engineering counterparts. There is a good reason you don’t have to carry a different type of charger or USB wire everywhere you go.
So there has always been a strong push for code reusability. Writing redundant code is often frowned upon. Why do the same work twice when you can do it once?
There’s also a long history of reinventing-the-wheel in software which gives code reusability even a higher priority on the priority list. Code reusability is often considered one of the industry’s “Best Practices”. An aspiring junior software developer might be inclined to think that there is zero downside to code reusability.
But there’s a hidden danger — and I don’t believe this stuff is ever properly taught in schools — of extreme code reusability.
Extreme code reusability means collapsing any two similar-looking pieces of code into one, regardless of their use cases & original intention.
Which a lot of times end up with code that has accidental commonness.
It might not be obvious why accidental commonness is bad, but one only has to maintain a sufficiently large software project for a long period to understand why.
It is bad because product requirements change and software is an ever-evolving, never-quite-finished product.
This constant-moving-target problem is something quite unique to software. If you are a structural engineer you are not expected to turn a house into a 20-story high-rise, or a car into a flying saucer. Yet in software we constantly do this.
When product requirements and use cases change, the underlying assumptions that the software were initially written for might no longer be applicable.
So that proud piece of common code that you refactored (but now have completely forgotten) no longer works the way you think it would.
I’ve lost count of how many painful refactoring projects or nasty bugs I’ve seen that are a direct result of premature optimization or accidental commonness — to the point where I now avoid things like Inheritance like a plague.
Things that have accidental commonness will quickly reveal their differences when they evolve beyond their initial state. Any rigidity in code commonness then would be a massive bane to get rid of.
The more layers of accidental commonness there are in the code, the more of a minefield it is to navigate. CVE-2018–17144 is a perfect example of that.