Best Server Storage Setup? 76
new-black-hand asks: "We are in the process of setting up a very large storage array and we are working toward having the most cost-effective setup. Until now, we have tried a variety of different architectures, using 1U servers or 6U servers packed with drives. Our main aims are to get the best price per GB of storage that we can, while having a reliable and scalable setup at the same time. The storage array will eventually become very large (in the PB range) so saving just a few dollars on each server means a lot. What do people out there find is the most effective hardware setup? Which drives and of what size? Which motherboards, etc? I am familiar with the Petabox solution which is what the Internet Archive uses — they have made good use of Open Source software. So what are some of the architectures out there that, together with Open Source, can give us a storage array that is much better than the $3 per GB plus that the commercial vendors ask for?"
CORAID (Score:5, Interesting)
15-drive array = $4000
750GB Seagate Drive = $420
Full Array (14-drive RAID5, one hot spare) = $10,300 for 9.75 Terabytes
That's $1.06 per gigabyte RAID5 with hotspare. It doesn't get any better than this. Even with labor to assemble and set it up, and shipping, it's hard to get above $1.50 a gigabyte.
I suggest CLVM and Xen on the servers. Xen makes it really easy to turn up a new box. The space is available everywhere. CLVM is flexible enough to allow you to migrate stuff across arrays (or span arrays) very easily. I actually boot off of a flash chip and pivot_root my Linux systems onto a filesystem running off of these.
These numbers are roughly my cost. E-mail me if you'd like to buy one and we can talk about it.
The most effective solution... (Score:2, Interesting)
It is not cheap.
Cost cost cost (Score:3, Interesting)
Dell MD1000 SAS/SATA JBOD + 15x 500GB SATA disks: ~$10500
Three of those, daisy chained to some head end server through a single SAS RAID controller. Guessing ~$4000 for the box.
That's 22.5TB of raw storage for ~$35k, or $1.57/GB, less if you work a deal.
You'll need about 45 of these for a petabyte (raw); $1.6M
Fabric (~5x 24 port layer-2 gigabit switches): ~$12k
600u of racks (5) + power: ~$25k
etc...
Tons of single points of failure, limited performance and zero management. Don't even suspect it's possible to get good IOPS from this; too few controllers, major throughput constraints on the 1Gb/s uplinks, etc... It's good for light to moderate streaming archival type use. Use Linux with lots of md and lvm for at least some measure of manageability.
Frankly, $3/GB for something with management tools, better reliability and/or better performance and looks pretty good, but if you're all about cost...