One of the things this article ignores completely is the area of embedded programming -- and there is still a LOT of it going on. There are still a ton of NEW projects being done based on 8051 and 6800 series derivatives -- and those are just two of the major architectures.
Even if you are not specifically doing embedded programming per se, the more you know about the basic architecture of your system the more you can help the compiler take advantage of it.
For instance, on the vast majority of processors comparing a register to zero is typically a VERY low cost operation. Furthermore, many processors have hard coded instructions that both decrement a register, compare it to zero, and branch if it is not. If you enable the most aggressive optimizations on some compilers they will attempt to do loop re-ordering (often with disastrous results) and do sometimes succeed. HOWEVER, more often than not there are chunks of code inside the loops that prevent effective re-ordering from occurring. If you are aware of your processors underlying architecture and try to intentionally write most of your loops as count down to zero in the first place, you make life much easier for the compiler and allow it to make more efficient code as a result. This is just one small case.
Also, as far as hand optimizations go, I still do it quite often -- even at the raw assembly level. With visual inspection and manual adjustment I have proved time and time again that I can do a MUCH better optimizing job than the Keil compiler can. I can also typically get some gains on ColdFire/Freescale stuff as well, just not quite as drastic.
I have worked on many projects over the years and seen more bad programmers than I care to admit -- and the most recent/youngest batch has both some of the best and worst I've ever seen (with far more of the later than the former). This is not their fault, it is the fault of what the university's are teaching them. At one company I used to work for (and this was a BIG company with over a hundred thousand of employees worldwide) our local HR department had a standing policy to NOT hire Computer Science graduates for permanent programming positions unless they had interned with us first. Basically, all the CS grads had far to many theoretical and inefficient/unreliable programming ideas to unlearn to be useful.
Also, there are many cases where even hand manipulation at the raw binary code level is still needed and useful. Although most projects I deal with now, thankfully, use flash for code space, a few still do use OTP (one time programmable) parts. It has not been that many years ago that I had to spend the better part of 2 months figuring out a way to "overburn" a set of parts by finding a safe location where I could turn existing instructions into NOP's followed by a branch to a new chunk of code at the end of our programmed space (when re-programming/overburning an OTP you can still turn a 1 to a 0 but not the other way around -- thankfully, the architecture we were using considered 00 as a NOP and we had left all the unprogrammed space as FF's). And yes, this is a very extreme example, but it allowed me to find a fix that allowed us to use over 35 THOUSAND mis-programmed parts that otherwise would have had to be tossed (and these parts cost in the $12 range EACH).
Similarly, on some large volume consumer products, manual code optimizations, low level coding, and hand tweaking is still the norm -- for a very simple reason: it saves money. In almost every case, it is still almost always cheaper to use the micros with less onboard flash and hand optimizing the code often allows us to keep code just below the threshold of the next size part. Similarly, on one project I was on we had 3 engineers spend 6 months hand writing a custom DSP algorithm that allowed us to remove a filter circuit whose component cost was on the order of $0.15 USD (yes, 15 cents). The management team was utterly thrilled over this as the volume for the circuit in question was way over 1 million units (you do the math -- on projects such as this engineering cost is essentially free, you throw as many resources at it as you can get for as long as you have and just hope you can optimize another penny or two out somewhere).
The world of Windows and GUI's is far from the only game in town -- it's just the one that everyone sees and thinks about. There are FAR more devices out there with embedded code in them than there are PC's. Just sit down and look around your house... think about your TV set (although anymore these are pretty high level code due to the ATSC switchover), your DVD player, your microwave oven, your thermostat, your home security system, your garage door opener, your printer, your scanner, your fax, your LCD monitor, your cordless phone, your freezer/refridgerator, etc. Even your PC itself contains many small embedded processors with their own custom firmware (e.g. the firmware on your hard drive, the firmware in the motherboard chipset [a logic block for an 8042 keyboard controller still exists in every chipset that can support a PS2 keyboard], the keyboard itself even has firmware...)
Whether you have ever realized it or not, embedded programming is EVERYWHERE.
"Everything should be made as simple as possible, but not simpler." -- Albert Einstein