Comment Re:Possible use: (Score 1) 82
Nah, not necessary anymore: Bush and Cheney are gone.
Nah, not necessary anymore: Bush and Cheney are gone.
You get the big speedup only if you're doing single precision floating point computations.
On the NVIDIA GTX 280 & 260, a multiprocessor has eight single-precision floating point ALUs (one per core) but only one double-precision ALU (shared by the eight cores). Thus, for applications whose execution time is dominated by floating point computations, switching from single-precision to double-precision will increase runtime by a factor of approximately eight.
A lot of my HPC customers do CFD with (1) double precision in (2) Fortran. 1 and 2 are not easy or fast with CUDA.
I've noticed several design suggestions in your code.