Comment Re:Silly (Score 1) 127
This has been the argument that I've seen to justify getting a GeForce over a Quadro in CGI. A few points:
1) The memory system on the Tesla/Quadro is much more rigorously tested and held to a much higher standard of quality than the GeForce. There is plenty of research evidence to prove this, and I have had plenty of anecdotal evidence to prove this point as well. NVIDIA doesn't give a crap about the memory in a GeForce because a miscolored pixel in 1/60th of a second doesn't matter. A soft/hard error in GPU memory for scientific calculations can be catastrophic. This is also a reason that NVIDIA is applying ECC memory to the Tesla C2050 and C2070 GPUs.
2) Some GeForce GPUs will have major threading errors after a few minutes of hard running. I've experienced this with a dgemm torture test with a Tesla and 2 GeForce GTX 285 GPUs in a single system. Give the test about 5 minutes on all 3 GPUs simultaneously and the GeForce GPUs will crash out at nearly the same time. The Tesla will continue the test until completion (which can be about a day or so)
3) Bandwidth starvation is a term to indicate that the cards are getting less bandwidth than they should be getting. On this FASTRA machine, only a few slots are full x16 Gen2, which end up being shared across 2 GPUs, making it effectively x8 Gen2 to each GPU. For other slots, it is even worse when you have a x8 Gen2 link going in. That has to be shared between a pair of GPUs. Technically, you can run the Tesla GPU in a x1 Gen1 slot if you had the right adapter, but the time it will take to transfer data from host memory to GPU memory may end up negating any performance benefits you might see out of using the GPU, unless you are using very heavy computational algorithms that are almost completely compute bound.
I couple years ago, I had a compute rig using 6 Tesla C870 GPUs, and even that setup was starting to get bandwidth starved as all GPUs were using a single x8 Gen1 link being aggregated to 6 x4 Gen1 slots (using adapters). I had to up the output data frame size on an MD simulation in order to have all cards performing equally. With smaller frame sizes, the first 4 GPUs were finishing their computations before the last 2 GPUs got their data.