Go look at the assembler that some of these compilers produce. It's frightening to see the amount of overhead they cost on even simple assignment operations.
I doubt this kind of code is being generated in *RELEASE* builds. I often check the code being generated in inner loops and most of the time it's the Right Thing (tm). I'm pretty amused to see that the compiler can aggregate calls to sin/cos with the same argument into a single fsincos call, or vectorize some loops over arrays. That's like having the best of two worlds: human readable code that maps directly to the problem at hand AND very well optimized generated code. And given a new CPU and a compiler that understands its architecture and can take advantage of it, my higher level code will profit from it with minimal change. PS: higher level code -> C++