It's not about altruism - they're talking about benefits for themselves just as much as for everyone else: we can all do better. Some things can't be done alone nearly as well as they can by a collective, which is why we have government in the first place. Sure one could say that simpler collectives like religious organizations, corporation and charities can accomplish collective goals, but even they operate at levels too small and single-minded to accomplish goals as large as an interstate highway system or an international space station as well as government does. These people want better roads and I think are seeing many areas in which our whole country is lacking and wondering why we are so hesitant to do something about it, considering how easily it appears we could afford it. I think often times people are stuck in their ways not because they really don't want anything else, but because they don't realize the consequences of their choices and what really matters. Have you heard the stories of the areas that have discovered that housing the homeless ends up costing less than leaving them homeless? It takes someone with a vision to suggest such a counter-intuitive improvement. People who measure *everything* in dollars are missing a lot. And now some people are speaking up, pointing out, hey, we can all live in a better world for all of us. Don't you want to try? It may mean fewer dollars in individual pockets, but I think they're proposing that the benefits outweigh the cost for *everyone* affected, and can we agree to find a better balance here? Just think about it... too many people just don't think about what really matters and don't realize what all the impacts of their capitalist upbringing are. As the late great Paul Wellstone said, "We all do better when we all do better."
Of course the non-wealthy would be proponents of raising taxes on the wealthy, but when wealthy people themselves are saying the same thing, it's really time to call into question whether the weight consensus should really be shifted in favor of higher taxes on the wealthy even if there are hold-outs... seems like it's time to at least talk about it.
You're not asking the right questions:
The first correct question is why on earth would someone need to access half a petabyte? In most cases the commonly accessed data is less than 1%. That's the amount of data that realistically needs to reside on disk. It never is more than 10% on such a large dataset. Everything else would be better placed on tape. Tiered storage is the answer to the first question. You have RAM, solid/flash storage (PCI based), fast disks, slow high capacity disks and tape. Choose your tiering wisely.
The second question you need to ask is how the customer needs to access that large datastore. In most cases you need serious metadata in parallel with that data. For Petabytes of data you cannot in most cases just use an intelligent tree structure. You need a web-site or an app to search that data and get the required "blob". For such an app you need a large database since you have 5M objects with searchable metadata (at 200MB/blob).
The third question is why do you have SAN as a premise? Do you want to put a clustered filesystem with 5-10 nodes? Probably Isilon or Oracle ZS3-2/ZS4-4 are your answer.
Fourth question: what are the requirements? (How many simultaneous clients? IOPS? Bandwidth? ACL support? Auditing? AD integration? Performance tuning?)
Fifth question: There is no such thing as 100% availability. The term disaster in Disaster Recovery is correctly placed. Set reasonable SLA expectations. If you go for five-nine availability it will triple the cost of the project. Keep in mind that synchronous replication is distance limited. Typically, for a small performance cost, the radius is 150 miles and everything above impacts a lot.
Even if you solve the problems above, if you want to share it via NFS/CIFS or something else you're going to run into troubles. Since CIFS was not realistically designed for clustered operation regardless of the distributed FS underneath the CIFS server, you get locking issues. Windows Explorer is a good example since it creates thumbs.db files, leaves them open and when you want to delete the folder you cannot unless you magically ask the same node that was serving you when it created the Thumbs.DB file. Apparently, the POSIX lock is transferred to the other server and stops you from deleting, but when Windows Explorer asks the other node who has the lock on the file you get screwed since the other server doesn't know. Posix locks are different from Windows locks. It affects all Likewise based products from EMC (VNX filler, Isilon, etc.) and it also affects the CIFS product from NetApp. I'm not sure about Samba CTDB though.
I would design a storage based on ZFS for the main tiers, exported via NFSv4 to the front-end nodes and have QFS on top of the whole thing in order to push rarely accessed data to Tape. The fronted nodes would be accessed via WebDAV by a portal in which you can also query the metadata with a serious DB behind it.
I've installed Isilon storage for 6000 xendesktop clients that all log-on at 9AM, i've worked on an SL8500, Exadata, various NetApp and Sun storages and I can tell you that you need to do a study. Have simulations with commodity hardware on smaller datasets to figure out the performance requirements and optimal access method (NAS, Web, etc.). Extrapolate the numbers, double them and ask for POC and demos from vendors, be it IBM, EMC, Oracle, NetApp or HP. Make sure that in the future, when you'll need 2PB you can expand in an affordable manner. Take care since vendors like IBM tend to use the least upgradable solution. They will do a demo with something that can hold 0,6PB in their max configuration and if you'll need to go larger you'll need a brand new solution from another vendor.
It's not worth doing it yourself since it will be time-consuming (at least 500 man-hours until production) and with at least 1 full-time employees for the storage. But if you must, look at Nexenta and the hardware that they recommend.
And remember to test DR failover scenarios.
Good luck!
Real Users know your home telephone number.