Modern gate-mapping, placement, and routing algorithms are quite sophisticated these days (and improving all the time), and computers are incredibly fast, relative to a human mind. Could a really good layout engineer do a full-custom 64b carry-lookahead adder that is smaller, faster, less power than an automated flow? Maybe. But how long is it going to take him to do the whole FPU? Or how about a complex block with upwards of 1M logic gates? Hand-layout for digital logic has rapidly diminishing returns, in my opinion. Better to have your layout guys do some awesome stdcells, and let the tools and PD wizards do the rest. In many cases, it'll be just as fast (if not faster) than the full-custom option. Note well that automated flows don't explicitly demand random/non-structured P&R algorithms.
And for the curious... The salary ranges that I posted have basically been increased ~2% (cumulative) over the intervening 4 years (2007 range vs 2011 range), and average employee salary probably increased ~5% (also cumulative), based on data I've seen from HR. Also, no, my company's HR department isn't refreshingly transparent. They're just completely clueless...
So what can you do with WDDM 1.1? For starters, you can significantly curtail memory usage for the Desktop Window Manager when it’s enabled for Aero. With the DWM enabled, every window is an uncompressed texture in order for it to be processed by the video card. The problem with this is that when it comes to windows drawn with Microsoft’s older GDI/GDI+ technology, the DWM needs two copies of the data – one on the video card for rendering purposes, and another copy in main memory for the DWM to work on. Because these textures are uncompressed, the amount of memory a single window takes is the product of its size, specifically: Width X Height x 4 bytes of color information.
Furthermore while a single window may not be too bad, additional windows compound this problem. In this case Microsoft lists the memory consumption of 15 1600x1200 windows at 109MB. This isn’t a problem for the video card, which has plenty of memory dedicated for the task, but for system memory it’s another issue since it’s eating into memory that could be used for something else. With WDDM 1.1, Microsoft has been able remove the copy of the texture from system memory and operate solely on the contents in video memory. As a result the memory consumption of Windows is immediately reduced, potentially by hundreds of megabytes.
And yes, I'm quite well aware that transfers from system memory to the GPU (or any other device) over PCIe are plenty fast for desktop normal operations. That's why I suspect the framebuffer size, rather than bandwidth to/from system memory, is the limiting factor. It's the only resource that would be approaching its limit with a huge number of windows open.
Other anecdotal point, WinXP doesn't show this same behavior with similar numbers of windows. Haven't had a chance to play with compiz or OSX, so I cannot comment on those. My *nix usage is generally limited to remote connections over NX, VNC, or just a remote xterm.