We're currently experimenting with clustered file systems. In the running to become our selling products are GlusterFS and Hadoop. The first because of its flexibility and easiness of setting it all up, the latter for the sheer fact that large companies are already using it and on paper it looks very slick. Hadoop only has the drawback of requiring java which in turn means the relatively cheap nodes suddenly become more expensive since we have to buy more ram.
GlusterFS should be able to support striping over AFR (automatic file replication) and thus supply a good mix between performance and reliability but so far i'm unable to produce a working configuration for this. The only setup i'm using right now is a 20-node cluster which contains 10 AFR pairs and are clustered with Unify. Even over GigE this performs quite well. I don't see any performance increase with the various optimizers yet. I have tried all but the 'boost' optimizer, so that's still worth a shot. Also some days ago 1.3.10 was released which fixes a hang bug when pulling out an AFR node. I'm not sure if this will fix the overwrite bug then, as i had to use the latest mainline 2.5 version for that fix.
Hadoop will be tested next week if all goes well. It will be just a matter of taking out the 256M RAM modules from 10 nodes and placing them in the other half, so java should have enough. Why one would develop such a system in java is beyond me, but then again, i'm not a developer so they must have had good reasons to do so.
Why am i typing all this? I don't know. Go see icanhascheezburger.com if reading this made you feel sad.