Secondly, another difference favoring Bitcoin mining on AMD GPUs instead of Nvidia's is that the mining algorithm is based on SHA-256, which makes heavy use of the 32-bit integer right rotate operation. This operation can be implemented as a single hardware instruction on AMD GPUs (BIT_ALIGN_INT), but requires three separate hardware instructions to be emulated on Nvidia GPUs (2 shifts + 1 add). This alone gives AMD another 1.7x performance advantage (~1900 instructions instead of ~3250 to execute the SHA-256 compression function).
For GPU programming I've enjoyed Nvidia's CUDA package greatly over wrangling OpenCL that Radeon relies on.
Wasn't there something about a PASCAL programmer knowing the value of everything and the Wirth of nothing?