Comment Re:Good grief... (Score 1) 681
The optimal code depends of each micro-architecture, and there are plenty of different x86 ones deployed everywhere, so you need to know a little bit of each.
Transferring that expertise to ARM or POWER isn't that difficult either. They're more different, but the same principles still apply.
Most of the time though, optimizations are entirely portable. Using the cache well for example, can be done in a way that is independent on the cache and cache line size in certain scenarios, and in others you can use a portable function to fetch the size.