High performance FFT on GPUs

Catch up on stories from the past week (and beyond) at the Slashdot story archive

High performance FFT on GPUs 274

Posted by Hemos on Monday May 29, 2006 @12:30PM from the testing-it-out dept.

A reader writes: "The UNC GAMMA group has recently released a high performance FFT library which can handle large 1-D FFTs. According to their webpage, the FFT library is able to achieve 4x higher computational performance on a $500 NVIDIA 7900 GPU than optimized Intel Math Kernel FFT routines running on high-end Intel and AMD CPUs costing $1500-$2000. The library is supported for both Linux and Windows platforms and is tested to work on many programmable GPUs. There is also a link to download the library freely for non-commerical use."

This discussion has been archived. No new comments can be posted.

High performance FFT on GPUs

Load All Comments

Search 274 Comments Log In/Create an Account

Comments Filter:

Final Fantasy Tactics? (Score:4, Funny)

by Musteval ( 817324 ) writes: on Monday May 29, 2006 @12:32PM (#15424911)

Why use a GPU for Final Fantasy Tactics? Couldn't you just use the GBA?

Share
twitter facebook
- Re:Final Fantasy Tactics? (Score:5, Funny)
  
  by numbski ( 515011 ) * writes: <numbski&hksilver,net> on Monday May 29, 2006 @01:47PM (#15425211) Homepage Journal
  
  Seriously! I mean, with only 1-D, it's really going to suck. I know people have accused Final Fantasy of being too linear in the past, but this is getting a bit silly...now, we're stuck at a single point!
  
  Parent Share
  twitter facebook
  - Re:Final Fantasy Tactics? (Score:3, Funny)
    
    by fireman sam ( 662213 ) writes:
    
    Ah, but great framerate.
It's nice... (Score:5, Informative)

by non0score ( 890022 ) writes: on Monday May 29, 2006 @12:34PM (#15424921)

if you're only considering 32-bit floating point numbers and don't need full IEEE-754 compliance.

Share
twitter facebook
- Re:It's nice... (Score:5, Interesting)
  
  by john.r.strohm ( 586791 ) writes: on Monday May 29, 2006 @12:51PM (#15425003)
  
  Depending on what you're doing, for an FFT, you probably don't need 64-bit floating point, and you DON'T need full IEEE-754 compliance.
  
  If you are taking data off of some kind of sensor, there are damned few sensors with 24 good bits of data out of the noise floor. Radars work just fine with 16-bit A/D converters.
  
  IEEE-754 compliance helps you in the ill-defined corners of the number space. FFTs inherently work in the well-behaved arena of simple trig functions and three-function (add/subtract/multiply) math.
  
  I'm currently doing FFTs with 16-bit fractional arithmetic in Blackfin DSP. For what I'm doing with the results, it is good enough.
  
  Not to mention you could use a "GPU farm" to do a fast search, and take any "interesting" data regions and feed those to a 64-bit, fully-IEEE-754 compliant, slow-as-molasses-in-January x86 FFT.
  
  Eventually, with some more articles like this one and yesterday's Cell piece, people will start to figure out that the x86 architecture is brain-dead and needs to be put out of its misery.
  
  Parent Share
  twitter facebook
  - Re:It's nice... (Score:5, Informative)
    
    by stephentyrone ( 664894 ) writes: on Monday May 29, 2006 @01:37PM (#15425172)
    
    FFTs inherently work in the well-behaved arena of simple trig functions and three-function (add/subtract/multiply) math.
    add/subtract/multiply math is the area that 754 has had the biggest effect on - in fact, the spec has very little to say about transcendental functions, but is almost entirely concerned with the basic arithmetic ops. prior to 754, floating point was, in general, not algebraicly closed under +-*/, nor were the results correctly rounded.
    most highly parallel GPU-type chips lack support for gradual underflow, for example, one of those "ill-defined corners of the number space" where 754 has been a tremendous boon. flush-to-zero is fine if you're decoding MP3s or unpacking texture maps, but it causes a lot of problems when you start trying to do more general scientific computations. sometimes those low order bits matter a whole lot; sometimes they're the difference between getting an answer accurate to 4 digits and an answer with *no* correct digits.
    "simple trig functions" have their own problems on these architectures; try writing an acceptable range-reduction algorithm for sin or cos without having correctly rounded arithmetic ops. sin and cos are, in fact, two the hardest operations in the math lib on which to get acceptable accuracy.
    admittedly, none of these objections are an issue with FFTs. but the reason that FFTs will perform acceptably on such an architecture is that the operations are (usually) constrained to the domain in which you don't encounter the problems i mention, not because the operations themselves are inherently safe. the lack of support for gradual underflow will cause you to lose many, many bits in frequency components that have nearly zero magnitude, but you usually don't care about those components when you're doing FFTs, anyway.
    
    Parent Share
    twitter facebook
    - Re:It's nice... (Score:2)
      
      by chriso11 ( 254041 ) writes:
      
      You hinted at this, but didn't explicitly state: as you increase the number of samples, you lower the SNR of the output, due to increasing numbers of rounding errors in the calculation. So, while you may be ok with 16bit fracints for a small sample set, as you increase the number of samples, you will eventually find that your noise floor obscures your results.
  - Re:It's nice... (Score:4, Insightful)
    
    by edp ( 171151 ) writes: on Monday May 29, 2006 @02:35PM (#15425376) Homepage
    
    "If you are taking data off of some kind of sensor, there are damned few sensors with 24 good bits of data out of the noise floor. Radars work just fine with 16-bit A/D converters."
    Take a look at their benchmarks [unc.edu]. The chart goes up to eight million elements. The accumulated rounding error in FFT outputs may be around n * log2(n) ULP, where n is the number of elements, and ULP (units in last place) is relative to the largest input element. (Caveats: That is the maximum; the distribution of the logs of the errors resembles a normal distribution. Input was numbers selected from a uniform distribution over [0, 1). The error varies slightly depending on whether you have fused multiply-add and other factors.)
    So with eight million elements, the error may be 184 million ULP, or over 27 bits. With only 24 bits in your floating-point format, that is a problem. Whether you had 24-bit or 1-bit data to start with, it is essentially gone in some output elements. Most errors are less than the maximum, but it seems there is a lot of noise and not so much signal.
    It may be that the most interesting output elements are the ones with the highest magnitude. (The FFT is used to find the dominant frequencies in a signal.) If so, those output elements may be large relative to the error, so there could be useful results. However, anybody using such a large FFT with single-precision floating-point should analyze the error in their application.
    
    Parent Share
    twitter facebook
    - Your error analysis is totally wrong (Score:5, Informative)
      
      by stevenj ( 9583 ) writes: <stevenj@alum.mit ... edu minus distro> on Monday May 29, 2006 @09:14PM (#15426458) Homepage
      
      The Cooley-Tukey FFT algorithm and its variants, which is what they are using, has much better error characteristics than you think.
      In floating-point arithmetic, the algorithm was proved in 1966 to have an upper bound for the error that grows only as O(log N), and the mean (rms) error grows only as O(log N). (See this page [fftw.org] for more info.) (Errors in fixed-point arithmetic are worse, going as N.)
      Even in single precision, the errors for their FFT sizes are probably quite reasonable, assuming they haven't done something silly like use an unstable trigonometric recurrence.
      
      Parent Share
      twitter facebook
      - erratum (Score:4, Informative)
        
        by stevenj ( 9583 ) writes: <stevenj@alum.mit ... edu minus distro> on Monday May 29, 2006 @09:17PM (#15426465) Homepage
        
        That should be O(sqrt(log N)) for the rms error and O(sqrt(N)) for fixed point. (My sqrt symbols didn't post properly somehow)
        
        Parent Share
        twitter facebook
  - Re:It's nice... (Score:5, Informative)
    
    by Waffle Iron ( 339739 ) writes: on Monday May 29, 2006 @03:38PM (#15425569)
    
    Eventually, with some more articles like this one and yesterday's Cell piece, people will start to figure out that the x86 architecture is brain-dead and needs to be put out of its misery.
    Why? Because the x86 isn't a DSP?
    The x86 is a general-purpose CPU. It isn't brain dead; historically it's almost always been at least half as fast as the latest expensive processor fad du jour, and sometimes it has actually been the fastest available general purpose processor. As these fads have come and gone, the x86 has quietly kept improving by incorporating many of their best ideas.
    The cell processor is basically a POWER processor core packaged with a few DSPs tacked onto the die. That sounds like a kludge to me, but if it turns out to be a success, there's nothing stopping people from tacking DSPs onto an x86 die.
    All a DSP is good at is fast number crunching. It usually has little in the way of an MMU, along with a memory architecture tuned mainly for vector-like operations, branch prediction tuned only for matrix math, etc. DSPs would make a bad choice for running general purpose programs, especially with cache and branch issues becoming the dominant performance bottleneck in recent times. DSPs would a horrible choice for running an OS with any kind of security enforcement. Using a GPU as a poor-man's DSP is interesting, but it suffers even more from these same limitations. If DSPs really offered a better solution for general-purpose problems, they would have replaced other CPU architectures decades ago.
    
    Parent Share
    twitter facebook
  - 32 bit not good enough (Score:2, Interesting)
    
    by MonaLisa ( 190059 ) writes:
    
    The sizes of transforms they are using for comparison here are of lengths of the order of 1 million points. This is huge for an FFT, and truncation error will definitely come into play here using only 32-bit precision. It all depends on what you are doing whether this will be adequate or not. Also, it's not at all clear what they did on the other platforms. There are some tricks to doing very long sequences; essentially using a 2D transform to perform a long 1D transform. It's not trivial, and requires
  - Re:It's nice... (Score:3, Interesting)
    
    by SETIGuy ( 33768 ) writes:
    
    If you are taking data off of some kind of sensor, there are damned few sensors with 24 good bits of data out of the noise floor. Radars work just fine with 16-bit A/D converters.
    I think you are confusing the precision of the input data with the precision of the power spectrum. An FFTs do a scaled add of a large number of samples, so the precision in the output is dependent on the number of input samples.
    For example SETI@home uses 1 bit complex sampling. (Yes, the SETI@home ADCs are a pair of high spe
Rush hour math. (Score:3, Insightful)

by Anonymous Coward writes: on Monday May 29, 2006 @12:35PM (#15424928)

""The UNC GAMMA group has recently released a high performance FFT library which can handle large 1-D FFTs. According to their webpage, the FFT library is able to achieve 4x higher computational performance on a $500 NVIDIA 7900 GPU than optimized Intel Math Kernel FFT routines running on high-end Intel and AMD CPUs costing $1500-$2000. "

GPUs are nice, but there's the little matter of getting data and results on and off the chip.

Share
twitter facebook
- That's where PCIe is useful (Score:4, Informative)
  
  by WoTG ( 610710 ) writes: on Monday May 29, 2006 @01:07PM (#15425045) Homepage Journal
  
  AGP was not very useful for bidirectional data flow, but PCIe is. GPU's are pretty sophisticated these days, so they've got the logic to handle moving stuff in and out of it's memory and over the bus to the CPU and the rest of the system.
  
  Parent Share
  twitter facebook
- Rush hour dollars, too (Score:2)
  
  by SuperBanana ( 662181 ) writes:
  
  GPUs are nice, but there's the little matter of getting data and results on and off the chip.
  Not to mention that their dollar figures are somewhat misleading, as they didn't include the cost of the host PC...and AMD Opeteron 280 processors don't cost ANYWHERE NEAR $2,000. They also didn't show us how cheaper processors do.
  You can buy a Tyan quad-CPU motherboard for about $1k, and the dual-core version of the 280 for $1k tops. So...that's $5-5.5K for a box with EIGHT processors that will do 6 million co
  - 8xx verses 2xx (Score:2, Informative)
    
    by Somegeek ( 624100 ) writes:
    
    Note that you can only use the 2xx Opterons in 1 or 2 CPU (2 or 4 core) motherboards. If you want to have 4 CPUs with 8 cores you need to use the 8 series Opterons. The Opteron 880 dual core currently starts at over USD 1,600.00 each, which makes your configuration start at about 7,500 just for the MB and CPUs. Then add the registered RAM, server case, big juicy power supply, drives, video, monitor, a UPS to protect all of that... It sounds sexy but realistically its going to be close to $10K if you bui
  - Re:Rush hour dollars, too (Score:2)
    
    by misleb ( 129952 ) writes:
    
    Not to mention that their dollar figures are somewhat misleading, as they didn't include the cost of the host PC...and AMD Opeteron 280 processors don't cost ANYWHERE NEAR $2,000. They also didn't show us how cheaper processors do.
    
    They also didn't say how cheaper GPU's do.
    
    You can buy a Tyan quad-CPU motherboard for about $1k, and the dual-core version of the 280 for $1k tops. So...that's $5-5.5K for a box with EIGHT processors that will do 6 million complex values in almost half the time as the $500 video c
- Re:Rush hour math. (Score:5, Informative)
  
  by corvair2k1 ( 658439 ) writes: on Monday May 29, 2006 @04:35PM (#15425743)
  
  Typically, when doing these measurements, the GAMMA group counts the upload/download time as part of the computation time. So, the 4x-5x speedup you're seeing is end to end, with results starting and ending in main memory.
  
  Parent Share
  twitter facebook
FFTs in GPUs, eh? (Score:3, Funny)

by Anonymous Coward writes: on Monday May 29, 2006 @12:35PM (#15424929)

Well, seeing as how the V.P. is such a V.I.P., shouldn't we keep the P.C. on the Q.T.? 'Cause if it leaks to the V.C. he could end up M.I.A., and then we'd all be put out in K.P.

Share
twitter facebook
Any 64 bit GPU's? (Score:3, Insightful)

by ufnoise ( 732845 ) writes: on Monday May 29, 2006 @12:36PM (#15424931)

While interesting, I need IEEE 64 bit double precision for my scientific applications. Are there any 64-bit floating point GPU's out there?

Share
twitter facebook
- Re:Any 64 bit GPU's? (Score:2)
  
  by TubeSteak ( 669689 ) writes:
  
  Well, if their library gives you a 4x increase for 32 bit stuff, would that mean a 2x increase for 64 bit math?
  /no really, would it?
  - Re:Any 64 bit GPU's? (Score:3, Informative)
    
    by Fordiman ( 689627 ) writes:
    
    No.
    
    Implementing 'big numbers', or numbers larger than the proccessor's spec, is actually quite computationally heavy when compared to the operations you're replacing. As such, a 4x increase in the speed of computation can translate to a (to pull a number from my arse) 0.25x loss of performance when dealing with larger floats.
    
    However...
    
    With CPU/GPU cooperation, the floating gap can be handled by using the CPU to generate a lookup table of high-precision trig as, say, a texture, and treating the numbers as m
    - Re:Any 64 bit GPU's? (Score:2)
      
      by stephentyrone ( 664894 ) writes:
      
      Addition is relatively light bignum math if the precision you're extending from is correctly rounded. GPU floating point usually isn't. Even if it is, you're usually talking about increasing your op count by a factor of 5 or more, which kind of blows the "4x performace speedup" out of consideration.
      - Re:Any 64 bit GPU's? (Score:2)
        
        by Fordiman ( 689627 ) writes:
        
        oops. I was thinking my times from the bigint lib.
        
        Though, the floating point trig table can be converted to fractional and - oh, wait.. fractional is pretty process heavy too.
        
        Ok, GP. The answer is quite a definitive 'No'.
- Re:Any 64 bit GPU's? (Score:4, Interesting)
  
  by Surt ( 22457 ) writes: on Monday May 29, 2006 @01:12PM (#15425059) Homepage Journal
  
  Not yet. But in the next or second generation out your wish will be fulfilled (more and more game developers are pushing for 64 bit color accuracy, which will necessitate a transition to fully 64bit GPUs in the not distant future).
  
  Parent Share
  twitter facebook
  - Re:Any 64 bit GPU's? (Score:5, Interesting)
    
    by TheRaven64 ( 641858 ) writes: on Monday May 29, 2006 @01:25PM (#15425117) Journal
    
    more and more game developers are pushing for 64 bit color accuracy, which will necessitate a transition to fully 64bit GPUs in the not distant future
    Current generation GPUs handle 64bit and 128bit colours already. A 64-bit colour value is just four channels of 16-bit floats (halfs in Cg parlance). A 128-bit colour value is a vector of four 32-bit colour values.
    If game developers wanted 256-bit colour, then GPUs would need to support 64-bit floating point arithmetic. This is unlikely to happen, however, since 64-bit colour (which is really 48-bit colour with a 16-bit alpha channel) gives more colours than the human eye can distinguish. In fact, even with 64- or 128-bit colour for the intermediate results, current cards only have a 10-bit DAC for converting the colour value to an analogue quantity that can be displayed on an analogue screen.
    It is worth noting that Pixar's RenderMan software doesn't use more than 128-bit colour, and films like Toy Story were rendered using 64-bit mode.
    
    Parent Share
    twitter facebook
    - Re:Any 64 bit GPU's? (Score:4, Informative)
      
      by Surt ( 22457 ) writes: on Monday May 29, 2006 @01:36PM (#15425165) Homepage Journal
      
      It's at the pixel shader level that you run into low color rendition on current GPUs, and also where the people doing math on GPU are doing their work. That's where the move to 64 bit will likely happen soon, and will conveniently help the math people as a side effect.
      
      Parent Share
      twitter facebook
- Re:Any 64 bit GPU's? (Score:3, Informative)
  
  by jthill ( 303417 ) writes:
  
  Yes. Some guys at LBNL took good look. /. had the story yesterday [slashdot.org]. When they were trying, they repeatedly toasted Cray's best. With a "naive" FFT implementation -- not half trying -- they got 80%.
- Re:Any 64 bit GPU's? (Score:3, Insightful)
  
  by UncleFluffy ( 164860 ) writes:
  
  While interesting, I need IEEE 64 bit double precision for my scientific applications.
  
  Depends on what you need 64 bit for - is it for the precision (i.e. mantissa size) or the range (i.e. exponent size)?
  If you can live with a double-precision mantissa but a single-precision exponent, it's possible to get that using single-precision building blocks with less than a 2x slowdown. Sorry, don't have the references to hand right now, but a dig around on Citeseer/Google should get you there.
  - Re:Any 64 bit GPU's? (Score:2)
    
    by ufnoise ( 732845 ) writes:
    
    It's the precision. I am solving a set of fully coupled partial differential equations. With only single precision, the matrices being solved may not be accurate enough and it may be impossible to get a solution.
FFT; (Score:2, Informative)

by MrShaggy ( 683273 ) writes:

FFT; Fast Fourier Transform - a specific algorithm, but used to indicate any algorithm attempting to determine the power versus frequency graph for a signal Dag-nadit
- Nope! (Score:4, Informative)
  
  by woolio ( 927141 ) writes: on Monday May 29, 2006 @07:51PM (#15426263) Journal
  
  Sorry, the FFT of a time-domain signal does **NOT** indicate how the power (or energy) of the signal is distributed.
  
  For the latter, you need a PSD (power spectral density) plot, which is obtained by finding the square of the magnitude of the freq-domain FFT (complex) outputs.
  
  And the term "FFT" usually describes a specific class of algorithms that finds a Discrete Fourier Transform of a signal in much less than O(N^2) time, where N is the number of elements/samples considered.
  
  However, the FFT is also useful to perform fast polynomial multiplication (and even fast multiplication of very very very long numbers). This application has nothing to do with power or frequencies in a signal.
  
  Parent Share
  twitter facebook
Math library for sale? (Score:2)

by decipher_saint ( 72686 ) writes:

So correct me if I'm wrong but, it's just a math library for sale, right?

Don't get me wrong, new tools are cool, but can someone explain to me why this is newsworthy?
- Re:Math library for sale? (Score:3, Insightful)
  
  by Improv ( 2467 ) writes:
  
  I don't know if it's exactly newsworthy, but it's kind of cute (and interesting) that the amount of specialisation that's going on in graphics cards leads to situations where one can persuade the graphics card to do one's (not graphics-related) work faster than one would be using the general purpose CPU for the same task. It's more amusement than anything else (although for those who want to do the computation, it's also useful).
- Re:Math library for sale? (Score:2, Informative)
  
  by Kortec ( 449574 ) writes:
  
  This is probably making a lot of developers, myself included, very very happy people. FFT's are where the proverbial magic happens for a lot of signals and systems analysis, as well as for the multiplication of very large integers. So anyone involved in gaming that includes digital signal processing (voice chat in UT, karaoke-karaoke-revolution type games, analog user input, etc.) is going to be happy, and anyone who's involved in multiplying huge integers (crypto anyone?) may very well have wet themselves.
  - Precision limit. (Score:3, Informative)
    
    by DrYak ( 748999 ) writes:
    
    The limit is the floating-point precision of the GPU.
    
    Most GPU can do max up to 32-bit floating point operations (depending on the brand and the model), where as most scientific applications use 64-bit and higher (the old FP unit could do 80bit FP math, SSE registers in recent processors can do 128-bits FP math).
    
    So some user will be happy, like for sound processing (GPU have already been used to process reverberation to create realistic sound environnement - too lazy to do the search for the slashdot referen
    - Re:Precision limit. (Score:2)
      
      by NVP_Radical_Dreamer ( 925080 ) writes:
      
      Not to mention that most scientific applications run mostly under *nix like Linux or BSD, for which GFX driver support isn't always incredible, specially for recent models,
      
      This is nothing but FUD, nVidia has always had great support for *nix OS's http://www.nvidia.com/object/unix.html [nvidia.com] even ATI has some support for *nix OS's https://support.ati.com/ics/support/default.asp?de ptID=894&task=knowledge&folderID=27 [ati.com] try using them sometime rather than listening to your tech gods at the local geek squa
      - Re:Precision limit. (Score:2)
        
        by Rakishi ( 759894 ) writes:
        
        The ATI drivers are by consensus seen as shit, and a great source of laughter whenever someone buys an ATI card and tries to get usable performance in linux.
        
        There is a difference between token support and good performance, and apparently you need to go back to school for reading comprehension as you missed the difference in the GP's post.
    - Cryptography? (Score:3, Insightful)
      
      by Nicolas MONNET ( 4727 ) writes:
      
      Crypto = Number theory = integer math.
      
      No need for floating point.
- IF (Score:3, Funny)
  
  by Mark_MF-WN ( 678030 ) writes:
  
  What is newsworthy is that this is a shameless attempt to secularize mathematics. It's right in the name -- Fast Fourier Transformation. That's idolatry. What can a man know about signals that God hasn't already made clear in the Word? Come to our website [answersingenesis.org], and you can learn all about Intelligent Factoring, which is on much sounder mathematical grounds because it develops entirely from biblical principles.
- - Re:Math library for sale? (Score:2)
    
    by dattaway ( 3088 ) writes:
    
    I'm sure hackers are already taking the Vista open source code and making competing distributions...
FFT=Fast Fourier Transforms (Score:5, Interesting)

by amliebsch ( 724858 ) writes: on Monday May 29, 2006 @12:41PM (#15424958) Journal

Isn't that what SETI@home uses for the bulk of its signal analysis? Would be kind of neat to leverage the millions of idle GPU's out there.

Share
twitter facebook
- Re:FFT=Fast Fourier Transforms (Score:3, Informative)
  
  by fredrikj ( 629833 ) writes:
  
  It's also used by distributed mathematics projects such as GIMPS to multiply large numbers. Unfortunately, if this implementation only operates in 32-bit precision, it will probably be less useful for this purpose since you'd have to do subproducts with fewer digits at a time, to avoid rounding error. I'm not familiar with the details, though.
- Interesting question... (Score:3, Interesting)
  
  by DrYak ( 748999 ) writes:
  
  The interesting question will be :
  Is the 32-bit precision enough for SETI@Home application ?
  Or does the project needs the higher precisions (64bit to 128bit) that can (for now) only be provided by the CPU ?
  
  IMHO, maybe this could be useful. They're trying to find which chunk contains candidate data. If there's some fast low-precision algorithm that can quickly mark chunks as interesting / recheck with higher precision / un-interesting, it'll be helpful to quickly tell appart interseting chunk, even if data n
  - Re:Interesting question... (Score:5, Informative)
    
    by SETIGuy ( 33768 ) writes: on Monday May 29, 2006 @09:14PM (#15426455) Homepage
    
    Yes, 32 bits is quite enough for our FFTs. Our requirements are fairly low. 16bit floats may even do the job (although I've never tried 16bit floats in SETI@home). What has concerned us in the past is that bandwidth to GPUs was fairly assymmetric (on AGP cards), the seti@home working sets (A few buffers of 1M complex samples == 16MB each) were larger than the usable memory on many video cards and the length of the maximum shader routine was fairly small. SETI@home does quite a bit more than FFTs, so moves into and out of main memory were required. At the time we couldn't put more into the shader language. That may have changed, but right now we lack anyone who both has the time to do the job and is capable of doing it.
    Our tests on nVidia 5600 series AGP cards (this was several years ago) showed that the net SETI@home throughput using the GPU was at best 1/5 of what we could obtain with the CPU. This was primarily due to transfers out of graphics memory and into main memory.
    PCI Express allows for symmetric bandwidth to graphics memory and graphics memories are now typically larger than the size of our working set. The difficulty will be in benchmarking to see which is faster for a specific GPU/CPU combination.
    At any rate it's a fairly simple job to swap FFT routines in SETI@home. The source is available [berkeley.edu]. Someone may have done it by now...
    
    Parent Share
    twitter facebook
This is news? (Score:2, Interesting)

by Anonymous Coward writes:

I have an uncle who's a professor who's been using GPUs for scientific computation for years. Apparently he has systems with four GPUs running simulations.
- Re:This is news? (Score:2)
  
  by Duncan3 ( 10537 ) writes:
  
  It's old news in academic, computational, and HPC circles, yes.
  
  New or the 2nd or 3rd dup here on /. tho
- Re:This is news? (Score:5, Funny)
  
  by Fex303 ( 557896 ) writes: on Monday May 29, 2006 @01:02PM (#15425034)
  
  I have an uncle who's a professor who's been using GPUs for scientific computation for years.
  
  I'm sorry, but playing Quake at really high framerates does not count as research. He's not fooling anyone.
  The business cards which list him as 'Profess0r of Pwnage' probably aren't helping either.
  It's also bad when he refers to the undergrads as 'n00bs' during his lectures to them.
  
  Parent Share
  twitter facebook
$1500-$2000? (Score:3, Insightful)

by Wesley Felter ( 138342 ) writes: <wesley@felter.org> on Monday May 29, 2006 @12:45PM (#15424976) Homepage

I sense a little bias here; the fastest Intel and AMD processors are actually $1,000.

Share
twitter facebook
- Re:$1500-$2000? (Score:5, Informative)
  
  by veg_all ( 22581 ) writes: on Monday May 29, 2006 @12:56PM (#15425015)
  
  RTFA? Or just look at the pretty diagrams.
  They're running against dual-processor systems (Opteron and Xeon).
  
  Parent Share
  twitter facebook
- Re:$1500-$2000? (Score:3, Informative)
  
  by MP3Chuck ( 652277 ) writes:
  
  Pricegrabber lists some up in the range of several thousand dollars ... like this Itanium [pricegrabber.com], for just over $5k and this Dual Core Xeon [pricegrabber.com] for $3,700.
- Re:$1500-$2000? (Score:2)
  
  by iamplupp ( 728943 ) writes:
  
  Actually, the top of the line Opteron costs more than 2000 according to amd.com
  
  Dual-Core Model 875 HE $2,149
  
  and i'm sure the fastest Xeon is similarily priced.
  - Re:$1500-$2000? (Score:2)
    
    by Wesley Felter ( 138342 ) writes:
    
    Of course, the latest Athlon 64 FX is both faster and cheaper than an Opteron 875. Same for Intel.
    - Re:$1500-$2000? (Score:2)
      
      by HuguesT ( 84078 ) writes:
      
      Single CPU, yes, but the latest opteron 8xx series supports at least dual-core, 8-way SMP. Not quite the same thing.
- Re:$1500-$2000? (Score:2)
  
  by Jozer99 ( 693146 ) writes:
  
  Fastest consumer processors are $1000. That doesn't count the Opterons or Xeons, which are priced much higher.
Cray-1 comparison (Score:5, Interesting)

by Mostly a lurker ( 634878 ) writes: on Monday May 29, 2006 @12:56PM (#15425016)

The Cray-1A supercomputer, weighing in at 5.5 tons, had an absolute maximum peak performance of 250 megaflops. It, of course, cost millions and the power requirements (including for cooling) were in excess of 200 kW. I remember marveling at the advanced nature of this technological achievement.
Thirty years later, a $500 GPU, weighing less than 1 pound, can produce 6 gigaflops. People complain about its power and cooling needs, but they are rather below 200 kW! We sometimes forget just how amazing the developments in computing have been over the last three decades.

Share
twitter facebook
- Re:Cray-1 comparison (Score:2, Insightful)
  
  by Anonymous Coward writes:
  
  People complain because they compare the power consumption to their old "home computer". Just look at the Apple II, the C64 or similar 8 bit computers, almost all of which had such low power demands that they could run without fans. Even the "IBM compatible" PCs up to and including 386s almost always had exacly one fan, the one in the power supply. My most recently purchase (second hand) computer has 6 fans, and draws enough power to justify most of them.
  
  You can probably make up your own flawed car analogy
  - Re:Cray-1 comparison (Score:2)
    
    by grammar fascist ( 239789 ) writes:
    
    You can probably make up your own flawed car analogy and compare top speed and fuel consumption of today's compact cars with the racing cars of 60 years ago.
    
    And then marvel at how automobile technology has advanced, which I think was the point.
- Re:Cray-1 comparison (Score:3, Funny)
  
  by suv4x4 ( 956391 ) writes:
  
  People complain about its power and cooling needs, but they are rather below 200 kW! We sometimes forget just how amazing the developments in computing have been over the last three decades.
  
  Why compare it with Cray-1, compare it with the steam-powered calculators of the past that take minutes to multiply two simple numbers and the results are sometimes kinda off.
  
  People always demand more, this is why they develop more, so to get more. If people become suddenly satisfied with whatever state they're in, they'
Good occasion (Score:2)

by 4D6963 ( 933028 ) writes:

At last a good occasion to explain what the FFT is! But huh, I'll rather link to Wikipedia, not that I couldn't explain by myself eh, pshhh!
Note how many comments are about explaining what FFT is as opposed to how many comments consist in asking what FFT means. Quite a fucked up demand/offer ratio.
Based on FFTW? (Score:2)

by 4D6963 ( 933028 ) writes:

That thing is called GPUFFTW, one could assume that it is based on the FFTW library (which is after all the best performing FFT library around) but after looking at every page on this site, I couldn't find a single credit to FFTW.
Are the two linked or is the W at the end of the name just a mere coincidence?
- - Re:Errr, I don't want to sound skeptical... (Score:3, Informative)
    
    by eh2o ( 471262 ) writes:
    
    djbfft apparently had an edge at some point, but now has not been updated for more than 5 years. meanwhile FFTW has incremented the major version number to 3, undergone a complete rewrite, added simd, multiprocessor, 64bit and a slew of other things (its obviously not a stagnant project). not to mention its the basis of the 'fft' function in matlab and thereby probably the most used fft implementation in the world. assuming their benchmarks (which now include accuracy as well as speed) are valid, Intel P
(Almost) Comparison to BSD Licensed version (Score:2, Interesting)

by achooo ( 977726 ) writes:

on this page [unc.edu] here they almost compare to a program called libgpufft (which is an open source BSD version of the same library here [sourceforge.net] ) I wonder how they do compared to the BSD licensed version---
Bytes/bits? (Score:3, Informative)

by 4D6963 ( 933028 ) writes: on Monday May 29, 2006 @01:23PM (#15425105)

From the site :
The Video RAM will determine the maximum array length that can be sorted on the GPU. A rough guideline for performing FFT on 32-bit floats is: Maximum array length in millions = Video RAM in MB / 32
Max array length equals video RAM in megabytes divided by 32... bits? Correct me if i'm dumb but shouldn't it rather be "Video RAM in MB / 4"?

Share
twitter facebook
- Re:Bytes/bits? (Score:2)
  
  by cnettel ( 836611 ) writes:
  
  You would naturally need some kind of workspace to store the results. I think that a normal shader is not supposed to write data back to from where it was read, so there can be some quite significant "read, multiply, write to new location" going on.
No surprise here... (Score:2, Insightful)

by nitrocloud ( 706140 ) writes:

Graphics Processing Units have always been better for FFTs and signal processing than general CPUs. I've read a journal article where machine vision was implemented on a GeForce 5200 at a 3x speedup over an AMD Athlon 3200+. The reason? This is what a GPU is made for; the small dedicated instruction set makes a GPU much more adept at signal processing than the 686's have ever been.
What's an FFT (Score:5, Informative)

by Geoffreyerffoeg ( 729040 ) writes: on Monday May 29, 2006 @01:31PM (#15425143)

Apparently nobody knows what an FFT is. Here's the best description I can give without descending into math too much.

The Fast Fourier Transform is an algorithm to turn a set of data (as amplitude vs. time) into a set of waves (as amplitude vs. frequency). Say that I have a recording of a piano playing an A at 440 Hz. If I plot the actual data that the sound card records, it'll come out something like this picture [pianoeducation.org]. There's a large fading-out, then the 440 Hz wave, then a couple of overtones at multiples of 440 Hz. The Fourier series will have a strong spike at 440 Hz, then smaller spikes at higher frequencies: something like this plot [virtualcomposer2000.com]. (Of course, that's not at 440, but you get the idea.)

The reason we like Fourier transforms is that once you have that second plot, it's extremely easy to tell what the frequency of the wave is, for example - just look for the biggest spike. It's a much more efficient way to store musical data, and it allows for, e.g., pitch transformations (compute the FFT, add your pitch change to the result, and compute the inverse FFT which uses almost the same formula). It's good for data compression because it can tell us which frequencies are important and which are imperceptible - and it's much smaller to say "Play 440 Hz, plus half an 880 Hz, plus..." than to specify each height at each sampling interval.

The FFT is a very mathematics-heavy algorithm, which makes it well suited for a GPU (a math-oriented device, because it performs a lot of vector and floating-point calculations for graphics rendering) as opposed to a general-purpose CPU (which is more suited for data transfer and processing, memory access, logic structures, integer calculations, etc.) We're starting to see a lot of use of the GPU as the modern equivalent of the old math coprocessor.

If you're looking for more information, Wikipedia's FFT article is a good technical description of the algorithm itself. This article [bu.edu] has some good diagrams and examples, but his explanation is a little non-traditional.

Share
twitter facebook
- Re:What's an FFT (Score:2)
  
  by grammar fascist ( 239789 ) writes:
  
  Bravo. Just... bravo.
  
  I was going to karma-whore this myself, but you beat me to it, and probably did a better job. :)
- - Re:What's an FFT (Score:3, Informative)
    
    by fredrikj ( 629833 ) writes:
    
    FFT is to naive FT as quicksort is to insertion sort.
    
    Be careful with the terminology; you correctly referred to "naive FT algorithm" above, but this sentence might give the impression that the Fourier transform itself is an algorithm. FT is a function whereas the FFT is an algorithm that computes the function. It would be more appropriate to say that FFT is to the Fourier transform what quicksort is to sorting.
Great for audio! (Score:3, Insightful)

by radarsat1 ( 786772 ) writes: on Monday May 29, 2006 @01:47PM (#15425206) Homepage

Awesome, this is really good news for audio people.
I want to see how I can take advantage of this... I hope the license isn't too restrictive.
It might be a good example of how to use the GPU for general purpose (vector-based) computation, something I've been wanting to explore.

Just curious, how does the use of the GPU for this kind of thing affect the graphics display?
Are you unable to draw on the screen while it's running, or something?

Share
twitter facebook
Finally.... (Score:5, Funny)

by Comboman ( 895500 ) writes: on Monday May 29, 2006 @02:11PM (#15425299)

Finally I have a good excuse to give the IT department why I need to upgrade my video card. I need to do FFTs faster (it has nothing at all to do with Doom3).

Share
twitter facebook
Unfair comparison (Score:2, Interesting)

by daveisfera ( 832409 ) writes:

The one thing that I haven't seen mentioned is that the benchmarks only show "compute timings" and not actual setup and retreval times. If the benchmarks showed the amount of time to get the data to the GPU and especially the time to get the result back to a place where a program could actually use it then it would be blown out of the water by the CPU. Future cards/drivers could speed up the process of retrieving the data, but for now there will always be lame benchmarks like this that are unfairly biased t
not worth it (Score:2)

by penguin-collective ( 932038 ) writes:

There are plenty of high-performance co-processor boards you could use. Often, they don't help at all because getting the data in and out is slower than just doing the operations on the CPU. Furthermore, the $500 is in addition to the CPU you already have. Third, the effort you invest in putting your code on the coprocessor is likely going to be a short-term investment, as these boards (whether GPU or otherwise) rapidly change, and as few other people are going to have exactly the same setup as you.

Overa
stating the obviou... (Score:3, Interesting)

by adolf ( 21054 ) writes: <flodadolf@gmail.com> on Monday May 29, 2006 @03:06PM (#15425481) Journal

Right then. So how long before they just include some weak general-purpose instructions in the GPU, add SATA and ethernet to the cards, and call it a budget PC?

Share
twitter facebook
Modular squaring (Score:2)

by Myria ( 562655 ) writes:

FFTs are also useful for squaring large numbers with a modulo. That's what http://www.mersenne.org/ [mersenne.org] uses them for.

Melissa
I wonder if this could work using MPI? (Score:2)

by ShyGuy91284 ( 701108 ) writes:

Someone in a parallel computing class I was taking did a parallel application using FFT to do processing. I wonder if they could incorporate this into MPI for parallel computations pretty easily?
Pro Audio applications? (Score:3, Interesting)

by rekoil ( 168689 ) writes: on Monday May 29, 2006 @05:01PM (#15425817)

I'm wondering whether or not the DSP latency of these libraries is sufficient to use with real-time audio processing...if folks were to write RTAS/AU/VST plugins using the library, how they would compare to other hardware-assisted DSP solutions such as the PowerCore and Pro Tools farm cards. Then again, if you have to spend $500 on a card to get this goodness, it's hardly a bargain (albeit cheaper than the above products...)

Share
twitter facebook
- Re:Uhh.. (Score:5, Informative)
  
  by Anonymous Coward writes: on Monday May 29, 2006 @12:34PM (#15424922)
  
  Fast Fourier Transform [wikipedia.org]
  
  Parent Share
  twitter facebook
  - Re:Uhh.. (Score:2, Redundant)
    
    by JourneyExpertApe ( 906162 ) writes:
    
    No, no, they're talking about playing Final Fantasy Tactics [wikipedia.org]. That's why they need a GPU.
- You don't need a translation... (Score:3, Insightful)
  
  by nick_davison ( 217681 ) writes:
  
  FFT:
  
  Some calculation which can be heavily optimized to simple but fast processing. Hence a [relatively] cheap part that does a few simple tasks very fast can out perform a more expensive part that can do a vastly greater range of tasks with more efficiency across that general range but less in a specific area when performing that optimized calculation.
  
  By capitalizing on this incredibly basic rule of computer science (the an optimized simple thing going fast is faster than a more powerful general thing that
- Re:Uhh.. (Score:3, Informative)
  
  by SpinyNorman ( 33776 ) writes:
  
  There's a somewhat non-obvious mathematical result that any continuous periodic function can be decomposed as the sum of a series of sine functions of different frequencies. This series of sine waves is referred to as the fourier series of the function. The FFT (fast fourier transform) is an efficient numeric algorithm to derive the coefficients of the fourier series for any function.
  
  One useful way to think of the FFT is as transform of signal data from the time domain (raw samples) to the frequency domain
  - Re:Uhh.. (Score:4, Interesting)
    
    by kabz ( 770151 ) writes: on Monday May 29, 2006 @01:16PM (#15425078) Homepage Journal
    
    Or in the form of a concrete example ... The little spectrum analysers in iTunes are a good example of taking some time domain data, analysing it, and displaying the low through high frequencies.
    
    As an example of how far we've come, I implemented the Cooley-Tukey FFT in assembler on an Amiga, and it was just barely out of real-time. You had to capture some audio data, then wait while it was analysed. Nowadays, you can write the same thing in Objective-C on a G4, using the standard audio capture library, and have the FFT's computed between redraw events.
    
    Parent Share
    twitter facebook
  - Re:Uhh.. (Score:2)
    
    by DarthChris ( 960471 ) writes:
    
    I feel I should point out here that the basic concept of Fourier series has both sine and cosine in, but that it is possible to rewrite them as what are referred to as sine/cosine half-series.
    
    Fourier series, as a concept, is analogous to the polynomial-based power series of a function - both allow you to construct an infinite set of coefficients. Just as Taylor's theorem can be used to construct a power series (and determine the coefficients) for any function, the FFT can be used to obtain the Fourier co
- Re:Uhh.. (Score:5, Informative)
  
  by Fordiman ( 689627 ) writes: <fordiman@g[ ]l.com ['mai' in gap]> on Monday May 29, 2006 @01:05PM (#15425039) Homepage Journal
  
  an FFT is a transform that turns a signal (like an audio file) into its frequency components (like a spectrograph). It's used for MP3 compression, sound EQs, jpeg compression, mpeg4 compression, and a number of other things (I use FFTW for tuning my guitar).
  
  FFTW is the 'Fastest Fourier Transform in the West', a cute name for the work of a number of graduate students who use several techniques to turn the FFT from 'Numerical Recipes in C' into a freaking speed daemon.
  
  GPUFFTW is much the same thing, but ported to your video card's GPU - which is generally more optimized for doing the 'apply a floating point matrix to an array' thing - thus speedin the FFTW up even more while relieving the main processor from doing the work.
  
  If you don't have a high-powered video card, this means nothing for you. If you do, it means the above operations (compression, spectrum analysis, etc) can be done faster and without eating up processes.
  
  Parent Share
  twitter facebook
  - Re:Uhh.. (Score:2)
    
    by yoz ( 3735 ) * writes:
    
    FFTW is the 'Fastest Fourier Transform in the West', a cute name for the work of a number of graduate students who use several techniques to turn the FFT from 'Numerical Recipes in C' into a freaking speed daemon. ... but not as fast as djbfft [cr.yp.to], right? (I'm asking rather than telling, as I have very little FFT experience and am intrigued to know)
  - Re:Uhh.. [little correction] (Score:2, Insightful)
    
    by faragon ( 789704 ) writes:
    
    For both MP3, JPEG, and MPEG4 the transformation used is not a Fourier transform (not even TDFT/FFT), but DCT/IDCT ([inverse]discrete cosine transform). The reason for using DCT instead of the FFT (equivalent to the time Discrete Fourier Transform) is because the DCT is computationally cheaper than the FFT (about one half, in the fundamentals is really a mutilated DFT/FFT), and it provides enough information for the band discarding approaches used in lossy data compression.
- Re:Uhh.. (Score:5, Funny)
  
  by exp(pi*sqrt(163)) ( 613870 ) writes: on Monday May 29, 2006 @01:18PM (#15425088) Journal
  
  FFT [berkeley.edu] is a data compression and encryption standard used by a wide variety of extraterrestrial civilizations. Seti@home spends most of its time running FFT code to look for signals. If we managed to communicate with any of these aliens we could ask them what it stands for.
  
  Parent Share
  twitter facebook
- - Re:As you should very well know... (Score:2)
    
    by chriso11 ( 254041 ) writes:
    
    I'm afraid we have to revoke your Geek membership... knowing how to hook up a composite video input for a TV isn't quite enough.
    - Re:As you should very well know... (Score:4, Funny)
      
      by Jesus_666 ( 702802 ) writes: on Monday May 29, 2006 @04:16PM (#15425691)
      
      Knowing how to hook up a composite video input is irrelevant. The civilised world uses SCART. You can't screw up hooking up SCART.
      
      And just you wait, France will develop a European alternative to that Fourier nonsense as well!
      
      Parent Share
      twitter facebook
      - Re:As you should very well know... (Score:3, Funny)
        
        by jawtheshark ( 198669 ) * writes:
        
        Just in case some people don't get the following statement:
        And just you wait, France will develop a European alternative to that Fourier nonsense as well!
        Read [wikipedia.org]... And notice the country....
        Mod the guy up.. He might be a troll, but he is funny :-D
- Re:Please (Score:3, Funny)
  
  by __aaclcg7560 ( 824291 ) writes:
  
  One more reason to buy an overpriced video card instead of an overpriced CPU?
  - Re:Please (Score:2)
    
    by kimvette ( 919543 ) writes:
    
    (based on the sig)
    
    But, can an Nvidia run FFT calculations faster than Chuck Norris? I doubt it!
  - Re:Please (Score:2)
    
    by @madeus ( 24818 ) writes:
    
    One more reason to buy an overpriced video card instead of an overpriced CPU?
    
    What do you mean an [slizone.com] overpriced video card, and why would you want one instead of an overpriced CPU?!
    - Re:Please (Score:2)
      
      by __aaclcg7560 ( 824291 ) writes:
      
      I wasn't aware that the FFT calculations could be performed across two GPUs. Of course, I still haven't upgraded my old 32-bit CPU to a new dual core 64-bit CPU yet. I'm still paying off the overpriced budget video card that I got recently for $50. :P
- Re:If you need gigaflops... (Score:2, Informative)
  
  by zeldor ( 180716 ) writes:
  
  not so custom really if you go with the right integrator.
  SRC computers puts out FPGA based systems that has a nice
  32bit floating point fft library already in their development
  environment. Most customers using the fft are for radar image
  processing where the best PC solution is 50 times slower
  then the fpga based solution. Think UAVs with smart
  tracking off their radar.
  http://www.srccomputers.com/ [srccomputers.com]
- Re:If you need gigaflops... (Score:2)
  
  by Hast ( 24833 ) writes:
  
  In my experience (I've dabbled with both GP-GPU and custom FPGA chips) FPGAs don't always come out on top. There certainly are things for which FPGAs are better, but there are also things for which GPUs are better.
  
  If for no other reason than that you can get a working GP-GPU program running in a day or two. Designing hardware (even on FPGA) typically takes months.
- The Windowing Problem (Score:2)
  
  by rumblin'rabbit ( 711865 ) writes:
  
  The company I work for, a seismic processing company, has a few terraflops of processing power provided by clusters of Intel/AMD chips. We do a lot of FFT's, although it's not our major computational roadblock. We're actually small potatoes. Some large seismic processing centres are on the order of 100 terraflops.
  And virtually all of this computation is single precision.
  There are enough inaccuracies in transforming to the frequency domain due to the "windowing problem" (which says that the finite length
  - Re:The Windowing Problem (Score:2)
    
    by stephentyrone ( 664894 ) writes:
    
    medical imaging. the inverse integral transforms that are used in processing are often computed using ffts. do *you* want a PET/MRI/whatever scan to miss the piece of shrapnel in your chest because it was small enough to be obscured by the single-precision noise floor?
    
    there's a bazillion other applications that need double (and higher) precision, but i'm pretty sure that's the most compelling one for the average person.
- Re:wasted processing power... (Score:2)
  
  by Jozer99 ( 693146 ) writes:
  
  What I'm worried about is heat. My actively cooled ATI Radeon X800 XL runs at about 49C on the Windows Desktop, but when I start a game or graphics benchmark, it shoots up to 75C. If I was always using the GPU, I would think that being that hot would shorten its life considerably.

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Final Fantasy Tactics? (Score:4, Funny)

Re:Final Fantasy Tactics? (Score:5, Funny)

Re:Final Fantasy Tactics? (Score:3, Funny)

It's nice... (Score:5, Informative)

Re:It's nice... (Score:5, Interesting)

Re:It's nice... (Score:5, Informative)

Re:It's nice... (Score:2)

Re:It's nice... (Score:4, Insightful)

Your error analysis is totally wrong (Score:5, Informative)

erratum (Score:4, Informative)

Re:It's nice... (Score:5, Informative)

32 bit not good enough (Score:2, Interesting)

Re:It's nice... (Score:3, Interesting)

Rush hour math. (Score:3, Insightful)

That's where PCIe is useful (Score:4, Informative)

Rush hour dollars, too (Score:2)

8xx verses 2xx (Score:2, Informative)

Re:Rush hour dollars, too (Score:2)

Re:Rush hour math. (Score:5, Informative)

FFTs in GPUs, eh? (Score:3, Funny)

Any 64 bit GPU's? (Score:3, Insightful)

Re:Any 64 bit GPU's? (Score:2)

Re:Any 64 bit GPU's? (Score:3, Informative)

Re:Any 64 bit GPU's? (Score:2)

Re:Any 64 bit GPU's? (Score:2)

Re:Any 64 bit GPU's? (Score:4, Interesting)

Re:Any 64 bit GPU's? (Score:5, Interesting)

Re:Any 64 bit GPU's? (Score:4, Informative)

Re:Any 64 bit GPU's? (Score:3, Informative)

Re:Any 64 bit GPU's? (Score:3, Insightful)

Re:Any 64 bit GPU's? (Score:2)

FFT; (Score:2, Informative)

Nope! (Score:4, Informative)

Math library for sale? (Score:2)

Re:Math library for sale? (Score:3, Insightful)

Re:Math library for sale? (Score:2, Informative)

Precision limit. (Score:3, Informative)

Re:Precision limit. (Score:2)

Re:Precision limit. (Score:2)

Cryptography? (Score:3, Insightful)

IF (Score:3, Funny)

Re:Math library for sale? (Score:2)

FFT=Fast Fourier Transforms (Score:5, Interesting)

Re:FFT=Fast Fourier Transforms (Score:3, Informative)

Interesting question... (Score:3, Interesting)

Re:Interesting question... (Score:5, Informative)

This is news? (Score:2, Interesting)

Re:This is news? (Score:2)

Re:This is news? (Score:5, Funny)

$1500-$2000? (Score:3, Insightful)

Re:$1500-$2000? (Score:5, Informative)

Re:$1500-$2000? (Score:3, Informative)

Re:$1500-$2000? (Score:2)

Re:$1500-$2000? (Score:2)

Re:$1500-$2000? (Score:2)

Re:$1500-$2000? (Score:2)

Cray-1 comparison (Score:5, Interesting)

Re:Cray-1 comparison (Score:2, Insightful)

Re:Cray-1 comparison (Score:2)

Re:Cray-1 comparison (Score:3, Funny)

Good occasion (Score:2)

Based on FFTW? (Score:2)

Re:Errr, I don't want to sound skeptical... (Score:3, Informative)

(Almost) Comparison to BSD Licensed version (Score:2, Interesting)

Bytes/bits? (Score:3, Informative)

Re:Bytes/bits? (Score:2)

No surprise here... (Score:2, Insightful)

What's an FFT (Score:5, Informative)

Re:What's an FFT (Score:2)

Re:What's an FFT (Score:3, Informative)

Great for audio! (Score:3, Insightful)

Finally.... (Score:5, Funny)

Unfair comparison (Score:2, Interesting)

not worth it (Score:2)

stating the obviou... (Score:3, Interesting)

Modular squaring (Score:2)

I wonder if this could work using MPI? (Score:2)

Pro Audio applications? (Score:3, Interesting)

Re:Uhh.. (Score:5, Informative)

Re:Uhh.. (Score:2, Redundant)