I completely disagree. The majority of the HPC realm still uses Nvidia only because they know CUDA and not because of any technological advantage. AMD has held the line at not allowing sloppy programming methods into their OpenCL compiler and that has held back a lot of HPC users from jumping ship. You can even see this in many complaints from open source projects, like Blender, where they refuse to produce proper multi-threaded code and rely heavily on the CUDA compiler to do the work for them.
The rest of your complaints, "shitty drivers", "piss-poor memory handling" and "worse performance per watt" are also bogus. I own or manage machines using a large number of Nvidia and AMD video cards, and have seen as many driver issues between the two that neither has come out worse. This is a typical fanboy stereotype that keeps being repeated with no real fact behind it.
Your second complaint is seen a lot in programming forums, but I have never seen anyone do a proper write up of any memory issues with any of AMD's generations and most of the conversations lead me to believe it was an issue of the programmer's personal preference not wanting to learn a second platform with less market share than an actual technical issue. Most of these issues would be alleviate if the programmer would just use a common optimized library and stop trying to redo the work themselves.
Lastly, AMD's offerings have historically produce more performance per watt and their latest offerings continue that trend. This, besides the bit shift ability you mention, is also one reason why AMD was used for Bitcoin mining and supercomputers.
http://www.tomshardware.com/re...
http://www.green500.org/news/g...
Now, my latest personal computer has an Nvidia GeForce GTX 980 inside because I more often need to fix CUDA code and noticed some of the games I wanted to play ran better on it (again, from the game designer's preference and not a technical merit). I personally own eight other video cards across AMD, Nvidia and Matrox (who use AMD GPUs these days) and three generations for testing.
And I am only sticking up for AMD because I admire their push to get people to code for multi-core better. Nvidia has been too conciliatory in the last six years in that respect, which is fine for their revenue stream and market share but not a good thing in the big picture for the broader computer industry. Since Moore's law has begun to slow, we are going to need a massive shift to multi-core optimized applications and we need programmers ready for that day.
AMD seems to be ready with the tough love to get everyone there while Nvidia keeps enabling bad behaviors.