I wonder which is faster. A box full of high end ATI cards doing GPU processing, or 64 RPis doing GPU processing? I'm guessing the ATI cards are probably faster because the Videocore IV on the Pi is pretty crappy. 66 Pis is roughly $2k. That would buy you 10 Radeon R9 280s, which is more than you can fit in a box. Lets assume you have 4 of them and use the other $1200 on the rest of the machine.
This would get you 11,856 GFLOPS (4 * 2964) of raw performance. Those 64 Pis will crunch through roughly
1,536 GFLOPS (64 * 24). Wow, it's not even a contest. The big caveat will be power consumption, the Pis will be a lot more efficient than a modern Radeon card, offset by the fact that they'll have to work a lot longer to get the job done.
So lets try using the CPUs instead. We'll compare this cluster against a modern medium-high end Intel processor. The Intel processor will be an I7-2600k. The Pis use a 700Mhz ARM processor that manages 0.041 GFLOPS, for a total of 2.624 GFLOPS. The Intel chip pushes
8.5 GFLOPS.
As an efficient use of money, this Pi cluster is a total failure. As a research toy it has some value, but total performance is less than a fairly ordinary PC that costs roughly the same. This doesn't even count all of the switches and power supplies and whatnot you need for the Pi cluster. Even if you aggressively overclock all of the Pis they just won't catch up. In general you top out at about doubling the CPU performance of a Pi with aggressive overclocking and the GPU generally only overclocks about 50% or so. The power consumption figures aren't even all that different when you consider that the Pi needs to be crunching for at least 4 times as long on any particular problem and that you have 66 of them to feed. Even a couple of watts add up across that many machines.