Lucky (?) for you, I just went through purchasing a storage refresh for a cluster, as we're planning to move to a new building and no one trusts the current 5 year old solution to survive the move (besides which, we can only get 2nd hand replacements now). The current system is 8 shelves of Panasas ActiveStor 12, mostly 4 TB blades, but the original 2-3 shelves are 2 TB blades, giving about 270 TB raw storage, or about 235ish TB in real use. The current largest volume is about 100 TB in size, the next-largest is about 65 TB, with the remainder spread among 5-6 additional volumes including a cluster-wide scratch space. Most of the data is genomic sequences and references, either downloaded from public sources or generated in labs and sent to us for analysis.
As for the replacement...
I tried to get a quote from EMC. Aside from being contacted by someone *not* in the sector we're in, they also managed to misread their own online form and assumed that we wanted something at the opposite end of the spectrum from what I requested info on. After a bit of back and forth, and a promise to receive a call that never materialized, I never did get a quote. My assumption is they knew from our budget that we'd never be able to afford the capacities we were looking for. At a prior job, a multi-million dollar new data center and quasi-DR site went with EMC Isilon and some VPX stuff for VM storage/migration/replication between old/new DCs, and while I wasn't directly involved with it there, I had no complaints. If you can afford it, it's probably worth it.
The same prior job had briefly, before my time there, used some NetApp appliances. The reactions of the storage admins wasn't all that great, and throughout the 6 years I was there, we never could get NetApp to come in to talk to us whenever we were looking for expansion of our storage. I've had colleagues swear by NetApp though, so YMMV.
I briefly looked at the offerings from Overland Storage (where we got our current tape libraries), on the recommendation of the VAR we use for tapes & library upgrades. It looked promising, but in the end, we'd made a decision before we got most of those materials...
What we ended up going with was Panasas, again. Part of it was familiarity. Part of it was their incredible tech support even when the AS12 didn't have a support contract (we have a 1 shelf AS14 at our other location for a highly specialized cluster, so we had *some* support, and my boss has a golden tongue, talking them into a 1-time support case for the 8 shelf AS12). We also have a good relationship with the sales rep for our sector, the prior one actually hooked us up with another customer to acquire shelves 6-8 (and 3 spares), as this customer was upgrading to a newer model. Based on that, we felt comfortable going with the same vendor. We knew our budget, and got quotes for three configurations of their current models, ActiveStor 14 & 16. We ended up with the AS16, with 8 shelves of 6 TB disk (x2) and 240 GB SSD per blade (10 per, plus a "Director Blade" per). Approximate raw storage is just a bit under 1 PB (roughly 970-980 TB raw for the system).
In terms of physical specs, each shelf is 4U, have dual 10 GbE connections, and adding additional shelves is as easy as racking them and joining them to the existing array (I literally had no idea what I was doing when we added shelves on the current AS12, it just worked as they powered on). Depending on your environment, they'll support NFS, CIFS, and their own PanFS (basically pNFS) through a driver (or Linux kernel module, in our case). We're snowflakes, so we can't take advantage of their "phone home" system to report issues proactively and download updates (pretty much all vendors have this feature now). Updating manually is a little more time-consuming, but still possible.
As for backups, I honestly have no idea what I'm going to do. Most data, once written, is static in our environment, so I can probably get away with infrequent longer retention period backups for everything more than 6 months old, while doing much more frequent backups of newer data (and /home, our RPM repository, etc). People who've been doing this longer than I have strongly suggested that I make heavy use of snapshots (and we'll have the space... for now) and back those up. Our planned backup system for the fresh is a 48-tape library with a couple LTO-6 drives in it (current on the older is a 60-tape doubled-up library w/ 2 LTO-5 drives) connected over FC to one of the two cluster head nodes. Software would be a personal choice, and while I'm not exactly *thrilled* with NetBackup, it does seem to work (and was what's been in use since before I started here). We haven't racked the new stuff yet, so we haven't even gotten that far.
Hope that helped a bit.