

It’s not one thing or the other.
For example I often end up using event loops. Where an event is a tagged union. Some events take up 1 byte, some 400. It’s almost effortless to put the big variants in the heap, and just keep a pointer in the union. So why not do it from the start.
Sure, optimizing every loop to make it vectorizable is probably not worth it, since that loop you wrote on the 10th commit might not even exist when the software is released. But there are many low hanging fruit.
Also, some optimizations require a very specific software architecture. Turning all your arrays of structs into structs of arrays may be a pain if you didn’t plan for making that switch.







This is the dram. Since the entire codebase is shit, you basically have to rewrite it basically in its entirety.
Which means you can do it with an actual good design.
And if you mess up on something, you have a working version you can consult.