Well, when targeting a machine with large numbers of cores, memory and power, sure - it's a better tradeoff to avoid too much optimisation, and go for maintainability over raw performance. I know that, I do this all the time. As for writing assembler, my assembler days are long gone. The last I did was a bit of 56k maybe 10 years ago for an audio pipeline. The closest I get these days is looking at the output of the compiler ;-)
However, when the user base is continuously worried about battery drain, how do you design a sensible tradeoff between memory use vs CPU time (storing vs re-calculating a given value), or knowing how to arrange data in cache lines to pull data efficiently through the memory bus to reduce runtime (and hence prolong battery life).
These devices are power constrained, and will be no matter what anyone says. Knowing the architecture, and the future direction of the architecture would allow devs to produce solutions that will scale, and be power efficient. Maybe you'll only get 10% power saving, but for a device which is being heavily used, this could translate into an hour or two of extra use which is going to be a big selling point for expensive handheld devices.