My name's Matthew Barnson. I'm happy to talk storage and tape technologies any time, and am pretty certain I'm not a pathological liar. But, you know, I could be lying about that. I live in Utah, and work in a pretty large data center nearby. It's my job to know what I'm talking about, and I've lived and breathed this stuff for a number of years. That said, I can always be mistaken.
Nice to meet you, Anonymous Coward. Feel free to send me an email (email@example.com) and we can talk use cases where tape is the obvious and better choice, and those where disk is the obvious and better choice. I'm a storage and backup admin working in the industry for nearly twenty years, and have had discussions similar to this over coffee tables, water coolers, and in board rooms. The discussions end up being about things like performance, ROI, archival needs, reliability, typical use case, auditability, and more. Depending on which angle you look at it, some technologies win and others lose.
The point of THIS discussion was some writer who assumed tape was dead learned otherwise. I allege tape is not dead, and has never been over the past six decades, for numerous good reasons (and some bad ones). That said, I have no particular attachment to it other than that it is often the right solution for enterprise needs when other solutions -- like finicky, unreliable optical media -- will not do.
Anyway, if you want to argue about raw vs. compressed capacity, that's fine. We compress data on our ZFS storage appliances because it improves performance, not just capacity. Same with tape. I routinely shove more than 10GB of uncompressed data at the 5TB at my T10K T2 tapes, and seamlessly/transparently pull 10GB of uncompressed data off of them. The fact it was compressed in between is relevant, perhaps, but what's also relevant is that we usually fit in excess of 10TB of data per tape. If you're willing to play by real names, I can provide some stats to back up the claim that most modern tape drives easily and typically achieve their rated compressed capacity figures.
We see that with LZJB compression on our storage appliances as well: about 1.7 to 2.4:1 compression, on average. It varies by what you're storing, of course. Our patch repository, for instance, sees pretty terrible compression ratios as it's trying to compress gzipped and zipped data. On the other hand, general-purpose file storage can see considerably better results.
I maintain that tape is a key sell for customers who audit us regularly. The fact that data is stored on tape, shipped to a secure facility for storage in an EM-resistant container and cage, and retained for a specific period is a revenue driver in the post-9/11, Sarbanes-Oxley, HIPAA era. I have to provide evidence on this to auditors regularly. Among other things, customers who care about their data often aren't satisfied with many pure on-disk solutions: they want data guarantees of timeliness, throughput, encryption and the keys for decryption, and timely windows for restoration of data in case of disaster or "oops". Yet these same customers often aren't willing to pay what it costs to have a fully redundant, disaster-tolerant environment that could weather another 9/11 and come up in an alternate location instantly. In that great land of the "in between" is one gigantic area where tape shines at a reasonable cost.
Tape has its share of problems, to be sure. But there are many cases where it is simply the best solution, providing a solution to common data transport and archival challenges like it has for the past sixty years.
The size of storage has continued doubling with surprising regularity. Not quite Moore's Law-ish, but close. For 7200RPM SAS drives:
2009: 600GB drives in common use.
2010: 1TB drives in common use.
2011: 2TB drives in common use.
2012: 3TB drives in common use.
2013: 4TB drives shipped, not quite common.
2014: 6TB drives are shipping Real Soon Now (gotta get the cash out of the new 4TB drives)
2015: 6TB drives will be common.
Today's average single-rack storage appliance runs a little over half a petabyte raw capacity, and three-quarter petabyte single-racks are shipping today. I think we'll see "a petabyte in 1 rack" by year-end 2014 as 6TB 7200RPM disks start arriving (looks like we'll be skipping 5TB completely). Where I work, filesystems still tend to be smaller than that, more-or-less governed by the compressed size of tape that services them. So an average filesystem runs about 2TB-17TB depending on the tape tech backing it up. To back up a 17TB filesystem on a single tape still takes about 15-16 hours; to transfer it onto another hard drive, still longer!
What's a good brand of tape drive for a home-user?
For most, the answer is "none". Use a cloud service to store your critical data, or a second hard drive with Time Machine or something like that. The Cloud service provider will do tape backup of critical data (even Google does!) to cover disastrous situations which can and have occurred. If you're dead-set on tape backup at home, any recent table-top LTO5 or LTO6 drive (typical cost: $1,500-$3,000) will fit the bill. Media cost is pretty trivial after that initial investment, less than $30 for 3TB. It's this high initial-investment cost that convinces people "tape is expensive". The initial cost layout is prohibitive for some home users. But let's say you buy ten 4TB hard drives; you've spent $4,000 for 40TB (late 2013 prices), and typically have to worry about ongoing power costs & failure rates for those drives (MTBF means you have something like a 1 in 4 chance of one of those drives failing each year). For a thousand bucks, you can buy about 33 LTO5 tapes for something like 100TB of capacity. Different costs depending on your needs.
Where I work now I'm far away from that, and we cannot think of anything but 24x7x365 ops, so restore is a euphemism for failure.
Ditto. The ability to restore data is required for disasters and an "oops". The latter you can mostly prevent through sound policy, the former you work around after-the-fact. Even in a 24x7x365 environment, you can't totally prevent disasters -- human-caused and otherwise -- from occurring. Recovery plans that don't include tape typically have vastly higher costs, both initial and ongoing. Tape is a speedy cost-saving measure for large enterprise that provides some unique advantages, but is not a complete DR solution by itself! StorageTek has some pretty amazing products to allow tiered storage services that leverage tape for infrequently-accessed data. Check into it; it provides VERY strong support for 24x7x365 operation while dramatically reducing storage cost, and is very transparent for users.
Your first-tier storage is hard drive based, right? Tape is only a backup. How can hard disk not be fast enough?
We do massive data dumps on a regular basis, and I was typing quickly. I probably should have said, "For our needs, hard disks are extremely inconvenient and their throughput is too slow individually to suit.". Good catch.
Unix will self-destruct in five seconds... 4... 3... 2... 1...