High performance FFT on GPUs 274
A reader writes: "The UNC GAMMA group has recently released a high performance FFT library which can handle large 1-D FFTs. According to their webpage, the FFT library is able to achieve 4x higher computational performance on a $500 NVIDIA 7900 GPU than optimized Intel Math Kernel FFT routines running on high-end Intel and AMD CPUs costing $1500-$2000. The library is supported for both Linux and Windows platforms and is tested to work on many programmable GPUs. There is also a link to download the library freely for non-commerical use."
Rush hour math. (Score:3, Insightful)
GPUs are nice, but there's the little matter of getting data and results on and off the chip.
Any 64 bit GPU's? (Score:3, Insightful)
You don't need a translation... (Score:3, Insightful)
Some calculation which can be heavily optimized to simple but fast processing. Hence a [relatively] cheap part that does a few simple tasks very fast can out perform a more expensive part that can do a vastly greater range of tasks with more efficiency across that general range but less in a specific area when performing that optimized calculation.
By capitalizing on this incredibly basic rule of computer science (the an optimized simple thing going fast is faster than a more powerful general thing that is only being used for one of its many potentials), attention grabbing headlines can be garnered.
Re:Math library for sale? (Score:3, Insightful)
$1500-$2000? (Score:3, Insightful)
No surprise here... (Score:2, Insightful)
Great for audio! (Score:3, Insightful)
I want to see how I can take advantage of this... I hope the license isn't too restrictive.
It might be a good example of how to use the GPU for general purpose (vector-based) computation, something I've been wanting to explore.
Just curious, how does the use of the GPU for this kind of thing affect the graphics display?
Are you unable to draw on the screen while it's running, or something?
Cryptography? (Score:3, Insightful)
No need for floating point.
Re:Cray-1 comparison (Score:2, Insightful)
You can probably make up your own flawed car analogy and compare top speed and fuel consumption of today's compact cars with the racing cars of 60 years ago.
Re:Any 64 bit GPU's? (Score:3, Insightful)
While interesting, I need IEEE 64 bit double precision for my scientific applications.
Depends on what you need 64 bit for - is it for the precision (i.e. mantissa size) or the range (i.e. exponent size)?
If you can live with a double-precision mantissa but a single-precision exponent, it's possible to get that using single-precision building blocks with less than a 2x slowdown. Sorry, don't have the references to hand right now, but a dig around on Citeseer/Google should get you there.
Re:It's nice... (Score:4, Insightful)
Take a look at their benchmarks [unc.edu]. The chart goes up to eight million elements. The accumulated rounding error in FFT outputs may be around n * log2(n) ULP, where n is the number of elements, and ULP (units in last place) is relative to the largest input element. (Caveats: That is the maximum; the distribution of the logs of the errors resembles a normal distribution. Input was numbers selected from a uniform distribution over [0, 1). The error varies slightly depending on whether you have fused multiply-add and other factors.)
So with eight million elements, the error may be 184 million ULP, or over 27 bits. With only 24 bits in your floating-point format, that is a problem. Whether you had 24-bit or 1-bit data to start with, it is essentially gone in some output elements. Most errors are less than the maximum, but it seems there is a lot of noise and not so much signal.
It may be that the most interesting output elements are the ones with the highest magnitude. (The FFT is used to find the dominant frequencies in a signal.) If so, those output elements may be large relative to the error, so there could be useful results. However, anybody using such a large FFT with single-precision floating-point should analyze the error in their application.
Re:Uhh.. [little correction] (Score:2, Insightful)