There are a lot of cases where the compiler will pick it wrong, especially in embedded world.
Maybe compiler assumes code is run from ram although it is run from flash, meaning different code is a lot faster.
Then there are cases which the compiler cannot even handle, like cache cleanup and memory barriers - there is no way the compiler can know the peculiarities of your (custom) system. Same with task swap and atomic operations. You might be able to write those with C intrinsics, but even then you must know what code the compiler will create (i.e. not let optimizer reorder operations over a memory barrier - far from trivial).
The compiler support for those cases hasn't really improved in last ten years or so, actually due to new aggressive optimizations it could be said it is worse.