I recon, your hard earned knowledge is only valid for current breed of CPUs. I am fairly certain that with a new CPU design, even if you know the exact ins and outs you would not know what actually is optimal code for that machine.
The interesting part of CPU design is that it is a self reinforcing pattern. CPU designers saw common patterns and started to optimize these. Programmers learned that certain patterns are more performant and started using them more. CPU designers optimized the "common case" further.
The current state of CPUs is so complicated that you can't hardly know all the ramifications of the design. Small changes can bring the entire performance characteristics out of whack. (Like hyper-threading making certain numerical applications slower...)
I am fairly certain that you can not explain why a certain instruction stalls the pipeline. But then you don't need to know, the interesting information is that it does.