You're not asking the right questions:
The first correct question is why on earth would someone need to access half a petabyte? In most cases the commonly accessed data is less than 1%. That's the amount of data that realistically needs to reside on disk. It never is more than 10% on such a large dataset. Everything else would be better placed on tape. Tiered storage is the answer to the first question. You have RAM, solid/flash storage (PCI based), fast disks, slow high capacity disks and tape. Choose your tiering wisely.
The second question you need to ask is how the customer needs to access that large datastore. In most cases you need serious metadata in parallel with that data. For Petabytes of data you cannot in most cases just use an intelligent tree structure. You need a web-site or an app to search that data and get the required "blob". For such an app you need a large database since you have 5M objects with searchable metadata (at 200MB/blob).
The third question is why do you have SAN as a premise? Do you want to put a clustered filesystem with 5-10 nodes? Probably Isilon or Oracle ZS3-2/ZS4-4 are your answer.
Fourth question: what are the requirements? (How many simultaneous clients? IOPS? Bandwidth? ACL support? Auditing? AD integration? Performance tuning?)
Fifth question: There is no such thing as 100% availability. The term disaster in Disaster Recovery is correctly placed. Set reasonable SLA expectations. If you go for five-nine availability it will triple the cost of the project. Keep in mind that synchronous replication is distance limited. Typically, for a small performance cost, the radius is 150 miles and everything above impacts a lot.
Even if you solve the problems above, if you want to share it via NFS/CIFS or something else you're going to run into troubles. Since CIFS was not realistically designed for clustered operation regardless of the distributed FS underneath the CIFS server, you get locking issues. Windows Explorer is a good example since it creates thumbs.db files, leaves them open and when you want to delete the folder you cannot unless you magically ask the same node that was serving you when it created the Thumbs.DB file. Apparently, the POSIX lock is transferred to the other server and stops you from deleting, but when Windows Explorer asks the other node who has the lock on the file you get screwed since the other server doesn't know. Posix locks are different from Windows locks. It affects all Likewise based products from EMC (VNX filler, Isilon, etc.) and it also affects the CIFS product from NetApp. I'm not sure about Samba CTDB though.
I would design a storage based on ZFS for the main tiers, exported via NFSv4 to the front-end nodes and have QFS on top of the whole thing in order to push rarely accessed data to Tape. The fronted nodes would be accessed via WebDAV by a portal in which you can also query the metadata with a serious DB behind it.
I've installed Isilon storage for 6000 xendesktop clients that all log-on at 9AM, i've worked on an SL8500, Exadata, various NetApp and Sun storages and I can tell you that you need to do a study. Have simulations with commodity hardware on smaller datasets to figure out the performance requirements and optimal access method (NAS, Web, etc.). Extrapolate the numbers, double them and ask for POC and demos from vendors, be it IBM, EMC, Oracle, NetApp or HP. Make sure that in the future, when you'll need 2PB you can expand in an affordable manner. Take care since vendors like IBM tend to use the least upgradable solution. They will do a demo with something that can hold 0,6PB in their max configuration and if you'll need to go larger you'll need a brand new solution from another vendor.
It's not worth doing it yourself since it will be time-consuming (at least 500 man-hours until production) and with at least 1 full-time employees for the storage. But if you must, look at Nexenta and the hardware that they recommend.
And remember to test DR failover scenarios.