Please create an account to participate in the Slashdot moderation system

 



Forgot your password?
typodupeerror
×
Hardware

Hard Drives as Backup Media? 47

rootus-rootus asks: "I funny thought struck me as I was going over the life expectancy for tape media for backups... Since the size of 3.5" hard disks is surpassing 100GB in a reasonably inexpensive package, has anyone thought of using them as backup media, as in a jukebox or autoloader? The access times and data transfer rate for data stored on them would make backing up databases, etc. MUCH more palatable (200+GB takes a LONG time to dump to tape for a full backup) Any thoughts on the matter?" Bet you've thought about this question before, haven't you? Has anyone done anything like this? If so, how well did it work?
This discussion has been archived. No new comments can be posted.

Hard Drives as Backup Media?

Comments Filter:
  • It's been thought of, and rejected. The reason for this is that the data storage and mechanical parts are contained in one unit, and failure of either makes the other useless. This means that if your drive stops spinning, but your data is fine you can't get to it. This wouldn't be a problem with removable media because you can change the read/write device.
    • I used to work with tapes a lot, and the biggest problem wasn't the mechanical device but the media on which it records. Tapes break, and how! modern tape drives (such as DDS 4mm) will carefully tension things, but I cannot trust my data to a thin ribbon of plastic.

      Also, if that drive were to happen to break, it would usually break with a tape in it. Not good at all.

      A portable firewire/i.link or similar drive would be nice to have around with a quick backup. It would get my vote. Then again, I liked my syquest portable drive cartridge. They were so darn simple and portable.
    • Re:Not good (Score:2, Insightful)

      by Liquor ( 189040 )
      It's been thought of, and rejected
      But it was rejected back when the cost of the mechanism was far higher that of the media alone. This suggestion is that the 'Media' IS the mechanism.

      When the cost of a TB of storage on IDE hard drives is lower than the cost of a TB of tape media (at least when you include the amortized cost of the tape drive - though it's becoming true without that nowadays) then "if your drive stops spinning, but your data is fine" winds up in the same category as any other tape failure.

      And I've yet to see a tape drive mechanism failure that didn't manage to corrupt (or even destroy) a tape.

      And in my experience, bad tapes are even more likely than failed hard drives.
    • What about the swap(don't know an official name) system, where you have 3 backup hard drives, and everytime you do a backup you cycle a different one so if one fails you have a failsafe(although older) copy.
  • I purchased a couple of 80Gb firewire drives for my backup needs. They ran about $275 each after shipping, though I'm sure it's cheaper now. Every day I bring one to the office with me and replace it with the one that was plugged in the previous day. This allows me to do full backups every night and data recovery takes almost no time at all.

    On the other hand, this isn't a perfect solution for most companies. First, it would be easy for me to bang the hard drives and have them not spin up. They also are a lot bigger than a tape cartridge. But they do save me lots of time--and that means a lot. I really don't expect these drives to last forever with the "trashing" that gets done to them every night; but since they aren't terribly expensive (for my company) I don't really care if I have to buy another.

    • Assuming you keep backups for hardware crash and disaster recovery, I think there is a little problem in your plan. Basically, during your day at work, your two firewire drives are at the office with you. What happens then if - say - a fire erupts in the building while you are out for lunch and destroys everything? Bye, bye, backups!

      I suggest you buy a third drive and keep the same backup habits. That way, in the case of a disaster, you would still have a one-day old backup at home.

  • I've done a reasonable quantity of backup-solution deployments, from the simple "tape drive in a server" to multi-element DLT libraries. I've had customers "invent" a version of this idea on many occasions. Typically, the customer's "invention" takes the form of one of several similar ideas.

    What it comes down to, though, is that the idea behind having multiple medias, stored _away_ from the production copy of the data, is a good thing. Until recently, this has only been really convenient with tape media. With the advent of very convenient hot-swappable hard drive carriages and support for hot swapping of hard disk media in nearly every commonly used operating system, I don't see why hard drives could not be used-- but they would need to be treated with a little more physical care than tapes.

    The "problem" seems to come when the (typically small-business) customer "invents" this idea, buys one of those cruddy "centronics connector on the back" sub-consumer-grade plastic "drive bays", slaps a hard drive in it, and starts doing backups to one hard drive from another. The cycle is something like: (1) insert 2nd hard drive, (2) wipe 2nd hard drive, (3) copy contents of production hard drive(s) to 2nd hard drive, (4) remove 2nd hard drive. They don't think about what would happen if, say, between steps 2 and 3 the production hard drive(s) failed.

    If you're going to use hard disks as "tapes", I don't think there's anything fundamentally wrong-- but buy the same number of hard disks as you'd buy tapes-- and rotate them in the same manner. Treat them as large, mechanical tapes. Keep them away from the production data except when in use.

    • ...but buy the same number of hard disks as you'd buy tapes...

      Yup. The exact type of the medium shouldn't be changing your backup strategy.

      Tape prices vary wildly, but figure on a buck or two per GB. Hard Drives aren't quite there yet, somewhere around four or five bucks per GB if you include a decent enclosure.

      I guess I'm old-school, but I still prefer tapes. They are also more robust to physical/temperature abuse, which I like.

  • In a mission-critical environment, it is possible to achieve a very high degree of system-redundancy by using hard drives as a backup solution by "breaking mirrors". (Of course if the system is mission-critical, I'd rather have a geographically distributed set of systems, but that's not always possible...)

    The idea is that you setup hot-swappable disks in a three-way mirror, that is, three drives containing the exact same data. When you want/need to take a backup, you simply pull the third drive out of the array. The system still has two drive with the full data set, so you don't loose your redundancy. You insert another drive to replace the one you pulled, and in a matter of hours it will be in sync with the rest of the array. That sync time will depend on the size of the hard drives you use, but should be well under 24 hours for up to at least 40 gigs.

    Of course, you need hot-swappable disk drives, and ideally a hardware RAID controller, and you might even want to consider having a disk for every day of the week, so that solution isn't exactly cheap. However, it gives you an instant snapshot of the system when you take your backup, and restoring is simply a matter of putting the backup drive in the machine (or another similar machine, if the first was damaged/lost). If you're paranoid (not to mention incredibly rich), you might even consider a four-way mirror to have two backups, and already have instant redundancy when you restore the system!

    Another interesting thing to do is to use disk-level encryption in such a scenario. Since all your drives are encrypted, so are your backups. The problem with that is that you need to provide the key (passphrase or hardware token, or combination) at boot time, which means additional downtime if there's an unscheduled reboot and you're not around with the key...

    Also, you might want to do things such as checkpointing your databases right before pulling the backup drive out in order to minimize the chance of data loss. But in the end, it can't be worse than an unexpected power down, and well-written applications and OSes deal with that pretty well. A logging filesystem will definitely help.

    • Many people use this for backing up to tape. You break the mirror logically, then stream it to tape, then add it back into the mirror and resync. It speeds backups because you back up from a disk which isn't otherwise busy with head seeks to other parts of the disk, and if you're doing it with software RAID, likely off a completely different SCSI controller.
  • I'm planning to use CD-RW for backup with my new box (just ordered the core parts today! :)

    Any opinions? Thanks.

    • CD-RW just isnt practical for backing up large amounts of data they may need to restored again. take a 40GB disk for example, thats quite a few cds, plus the inconvenience of changing the cd every 700megs. its alright i guess for a home user with a few files to backup, but not for a complete system backup/recovery solution. even the smallest tape drives out there on the market can hold a couple of gigs.
      • Well, that's me. Small home user. I already have things set up so everything I want to backup is in one place (or mirrored with scripts). I just run the script and copy the .bz2 to a floppy. 650 MB will greatly relieve the cramped feeling :) And I want a CD-R anyway for music and data transfer, so I may as well use it for backup instead of buying and finding room for a tape drive.

        Thanks for the input.

        • can you restore arbitrary files from within a .bz2 if part of the file becomes corrupt? i.e. if you scratch part of the cdrw are you going to lose the whole archive?

          I'd personally just save the raw data so that you could just pop the cdrw in and copy over the relavent data.

  • A few notes on your idea:

    1. There is no need to build a mechanical autoloader. IDE controllers and removable drive bays are cheap, less than $25 per drive, making them much cheaper than a robotic loader, with greater reliability and response time to boot. IDE drives can be spun down when they've been idle for a while, so electricity consumption should be similar.

    2. I believe that Linux IDE does not currently support hot swapping of drives, although the PCMCIA drives do support removal of an entire IDE controller, which is what happens when you remove a CompactFlash card.

    3. My understanding is that hard drives are not hermetically sealed but rather have air filters similar to what you stuff at the end of a cigarette is made of. I believe that when hard drives are not in use, they can accumulate dust internally and are more likely to have problems. You may also have problems with their greater sensitivity to being dropped and to statically electricity. So, you may want to store them in sealed conductive bags.

    4. In my humble opinion, I think you have a good idea. I believe that, disk-based backups are much more valuable to an organization because they're easy enough to use that people will save time by doing minor recovery tasks. In comparison, with tape backups, the effort of doing a restore can be so much that people will often opt to spend an hour regenerating their previous work from scratch instead.
    • If you don't mind spending a bit the 3ware IDE RAID cards support hot swap. I think their cheapest card is around $125 or so. Maybe some other cards do to?

      I think this is a good idea too. We brought a tape drive a while back but I never got around to figuring out the best combination of tools to back things up. This is a great solution especially considering we have backup IDE drives sitting around waiting for a good one to go bad (RAID mirrors).
      • I read that 3ware stopped selling their Escalade controllers a few weeks ago. They now "focus" on their Palisade NASes... I just checked their web site, and even the announcement to that effect is already gone!

        This is bad news. The Escalade controller was definitely my favorite for midrange applications. It is well supported in the Linux kernel, and presents a SCSI interface to your system while still using cheap IDE drives... My favorite server has an Escalade 6800 with four 40G HDs in a RAID 1-0 configuration, which means splitting read requests on four drives -- that's quite fast!

        Has anybody tried the Adaptec AAR-2400A? It might be a good alternative...

        • Are you sure about this? I've heard the rumor too but I also read that it was false because 3ware's own products require the cards they are selling... I'm not saying you're wrong but it would be nice to have some proof.
          • FYI, I haven't seen anything on their site regarding them dropping the cards, but at the same time they aren't providing any information for them either... they are merely giving you a phone number to call or an email address to write if you have any inquiries.

            My bet is that they are not providing them any more and are concentrating on their IP products.

            Personally, I use Promise [promise.com] SuperTrakSX 6000 [promise.com] IDE RAID cards. They work quite well, and offer just about any RAID config you want to use with IDE drives. They also support hot-swappable IDE drives, but require a special drive bay in order to protect electronic components from frying (motherboards, controller cards, drives, etc.).

  • We use a separate sync server with lots of
    disk space and then do nightly dumps over nfs
    to this box. This server is located in a
    separate building with ethernet between the
    systems.

    Every night on our servers, a script runs and
    dumps the local filesystems at an appropriate
    level. We then gzip the dump and store it on
    the dump server. Since each file is uniquely
    named, we can store old dumps as long as disk
    space permits. In our case this is about 1 week
    of old dumps.

    The scripts are trivial to write (think dump |
    gzip >> /nfs/mount/on/remote/server. For a
    while we used to dump between remote locations
    and move the data via rsync but it takes forever.
    local ethernet is a huge plus for moving gigabytes
    of data nightly.

    for more important data like our cvs repository,
    we snapshot it hourly, daily, weekly, and monthly
    as well as tarring it up hourly and daily. This
    means we have like 15 entire copies of our CVS
    tree. It's probably overkill but it helps a lot
    come panic time.

    You can easily add 400gb of disk space to a
    regular pc for about $1200. In our case we do
    it all in less than 100. The other nice thing
    about doing this is that we have instantaneous
    access to our dumps and can access them much
    quicker than tape.

    In a perfect world I'd also like to back up
    the data to tape as well, but haven't yet done
    so. I suppose if we wanted to be extra safe we
    could also mirror the drives on the sync server
    or rotate the data between physical disks so that
    it would take multiple failures to lose the
    backup data.

    --chuck
    • I'm just "rsync / backup@remote:/backup/$HOSTNAME"ing every night to a box offsite that rotates the drive mounted on /backup every day when a backup's not running. It runs overnight when the network's not real busy, and works fairly well. I backup the really important/dynamic stuff on site on a daily basis with a 7-disk DVD-RAM rotation. It's the right balance of price/simplicity v/s date safety for my organization, and is pretty idiot-proof.

      The drives in the remote backup server (which could easily be co-located at your nearest ISP) aren't "removable", but they're certainly not premanent either. :)
    • You can easily add 400gb of disk space to a regular pc for about $1200. In our case we do it all in less than 100.

      This price sounds rather high. You can add a complete server with RAID storage for less than that. I had a quick check at RaidZone, and a cube with 10x60GB drives gives you 470GB available to users with a hot spare, and it runs at a wee bit over 10K. As a backup system, it's overkill - you probably don't need Raid but I wanted a fast price snapshot.
      http://www.raidzone.com/Products___Solutions/Appli ances_Overview/AppPrices/appprices.html [raidzone.com]

      Personally, we do use online storage for some of our archives, because our users need very fast restores. We create zip files on disk ,give them an automated method to pull from them, but also back up the archives to tape for long-term offsite storage.
  • Ever drop a tape while taking it out of the bay and stuffing it into the tape store? I have. The tape was fine.

    Even rugedized drives, when dropped from arm's length, are not going to hold up too well. Cheap drives will definately not hold up.
  • For myself, working at home on a cable modem box, I don't need a lot of backup space. What I do want is the back to be at a remote location. And I do want it to be as automated as possible because if it requires me to physically do something on a regular basis I'll just stop after a couple of months, as soon as my schedule gets tight or whatever.

    My solution is to find another person on a fast connection who has the same needs, and arrange to let him ssh into my box and have a few gigs worth of space, and give me the same.

    Right now the only scriptified part is creating the backup files. I encrypt them and scp them to his box by hand. I will eventually have it all automated, including deleting the oldest backup if space is getting tight.

    This probably isn't an option for a "real" backup solution, such as for a business or a network with a number of users. But all I want is my home directory, mail, etc. Hell, my bookmarks file and mail are probably most of what I want, and the rest is mainly small latex docs.

    I think there is probably a way to use freenet for this, but I didn't think that through all the way. If I inserted my backups into freenet, and a fire burned down my house, how would I know what keys to use to get my backups back out of freenet ?
  • *cough* RAID *cough*
    • So...

      ..what happens when a file gets deleted accidentally?

      ...what happens when the RAID controller goes beserk and munges all your data?

      ...what happens when an airplane crashed into your server room and takes out your "*cough* RAID *cough*" solution?

      I think you'd be SOL and looking for a new job in a poor job market. :-)

      -sid
  • Okay, this is sort of an off-topic rant, but can anybody tell me what's up with ATX tower cases with 4 5.25 inch drive bays, but only the upper two are useable for anything as long as a CD or 1.2Mb floppy drive because the standard ATX motherboard is in the way, in other words, the case is high enough and wide enough, but not deep enough. Anybody else fighting this particular frustration factory?
  • The company I work for does this occasionally. Usually where we have a SLA in place that requires us to perform a backup or restore within a certain timeframe.

    Typically a backup to disk is made in order to get the backup done as fast as possible, then that backup is dumped to tape. Simple restores are quick and relatively easy because the most recent backup is always online and if we have a more serious failure, we can still restore from tape.
  • by The Mayor ( 6048 ) on Friday October 26, 2001 @11:19AM (#2483670)
    I'm reading some of the replies and thinking to myself that the /. readers don't understand what a backup system is.

    A backup system is not simply redundancy (i.e. RAID). A backup system for files typically can recreate any version of a file requested by the user (as backed up according to the backup regimen). Thus, if you have nightly backups, you might keep every night for the past month, every month end, and every year end for a given document. RAID won't give you this.

    I'm familiar with some expensive IBM products that do this. However, they're expensive. Basically, ADSM (ADSTAR Data Storage Manager, or something) is a product that allows regular backups of products, and access to every incremental version of the documents. On the backend, it can be hooked up to a huge disk cache and a robotic tape library. The end result is terabytes of near-online access data, with automatic versioning. Pretty nice. And if your disk cache was large enough, it would never hit the tapes. It seems to me that this could be modified to remove the tapes and present what the user requires.

    I'm not aware of anything open source or free (as in beer) that does this. It would be really nice, though.

    Hell, I've always dreamed about an automatic versioning filesystem. Documents would be automatically versioned. You could use CVS to handle this. Perhaps you could do something as simple as have some code executed upon every file close for files that are opened with write access. When these files are closed, they are added as new versions of the document within CVS.

    When the disk reaches some capacity watermark, a disk cleanup agent would be invoked. Its goal would be to remove redundant versions of old binary files from CVS. Rules could be attached to the agent to perform tasks such as retaining specifc versions of binary files (i.e. retaining the first version, the latest version, and all versions from the last named version).

    Users could tag specific versions of files. These versions would always be retained.

    I know this would incur a significant performance hit for disk access. Perhaps I could limit such disk access to specific directories or mount points. In this manner, I could have a mount point for documents, all of which would be automatically versioned.

    Plugins for Explorer could be built to allow users to tag versions of documents and retrieve specific old versions of files. I'm thinking something like TortoiseCVS, a beautiful piece of software. In fact, for prototyping, TortoiseCVS would be enough.

    Now, is anything like that available? No? Perhaps I should do something about that.

    Cheers.
    • ADSM is now known as TSM, the "Tivoli Storage Manager". While it's nice in principle, in practice it has many flaws.
      • It's almost useless for a complete backup, since the client can't handle restoration of system-critical files (at least on platforms I've used)
      • Platform support isn't very good, and it's getting worse. You can only get the clients in binary, and the list isn't very long. Want Linux/PPC? Tough.
      • Authentication/xfer is entirely clear-text. This makes the system pretty much useless for backing up sensitive files.
      • Reliability seems dubious, and getting worse. I've seen both backups and restores mysteriously time-out or fail for some other reason, and it's NOT the network. Maybe it's just our setup here, but I sure don't trust it.
      So, for certain limited uses I'm sure it's swell, but it's hardly a panacea. It's also absurdly expensive, but I suppose that's par for the course.
      • I'll second that opinion. We use TSM 3.7 on a Sun E4500 with a STK L700 and an HDS 9960. We have an ELA with IBM so we get it for "free", problem is IBM support sucks rocks. IBM, like most "big" companies, is really made up of a bunch of subgroups ie Tivoli, Lotus, Storage, Mainframe, AIX, Netfinity, etc. So when you have a problem, they all point at each other and the net result is the customer is screwed. I have an RS/6000 with ADSM/HSM and an IBM 3995-C64 optical library, and it is barely usable because it is so *damn* slow and unstable. The only platform where TSM works well, is the Mainframe I hear.
  • Some grad student from China came here with a 6 gig IDE disk with all his data on it. I though it was kind of weird myself, but I guess it worked out OK...

    Though drives will often die if left to their own devices [ie, off] (we say they get lonely and kill themselves). Which would really suck if that was your backup, wouldn't it?
  • Yes, RAID is for redundancy, not back-up, but the difference is really about how you configure it and if you need offsite storage. Let me explain our systems.

    We build large (600MB-1TB) systems, either on W2k or RHL (Honestly our customers almost never prefer RHL and I'm waiting for WINE to get to 1.0 to be able to convince them to switch, but that's a tangent...) for digital storage of security video. We have specialized hardware to capture and record onto a pc's EIDE disk drives, and use 3ware cards to expand the EIDE array. Tape back-up of such large systems is useless not only because of the time to write the data, but the recovery speed is not viable for video. It takes hours to review the damn tapes. Might as well spend less and stick to VHS! So our method is such.

    Normally in security video data, the cost to mirror or back up is beyond means of the customer, but when they ask for it we set up a three way back up. We dump the data to one of three disk array sets, either locally or across a high speed line. This way we only erase the oldest version before laying down the new copy. This prevents catastrophic loss in-between erasure and new back-up (mentioned above). We do have customers using pull-out drives, and we have very little trouble with them. We use the expensive trays (not the plastic crap), and they hold up fine. And the drives are pretty hardy also. Don't drop it, thats true, but don't freak either.

    We normally don't mirror, you don't get 2 versions (2 months of recorded video) and generally a very expensive way to "back-up" (TCO and maintanence). And I have found that the best way to keep a hard drive back up is not to turn them off. Keep the system spinning and the drives last longer.

    Also a point is that if the data is not hyper critical, than the hard disk mtbf rate should be sufficient. Take in mind that there are "no quibble" warrantees for most drives. They'll ship you a new one before you send them the kaput one. That improves up time, and generally you don't offer a guarentee you expect to honor;)

    Over-all, ignore the nay-sayers. Hard disks work for back up, its cheap, and you can't have the low latency/record/review time with any other low cost back up.

    Its worth the extra care!

    btw, here's a shameless plug of our site [digitalcctv.com], and we have bought up the remaining stock of 3ware cards to build systems. drop a line if you want one for a deal.
    • I don't think you described this, but I've always wondered:

      If you have a RAID System (I'm sure only certain levels, and think RAID 1 is one of them) which stores duplicate data on drives, the idea is that a certain percentage of the drives could fail, and your data would not be lost. So if I had a 2 drive RAID 1 system, couldn't I just rip out a drive, send it somewhere and call it a backup? Then, I can put in a blank drive, and the RAID controller will reload all the data?

      If I need some data off of the back up, I could stick the disk in an identical machine (or the same machine), power it on, copy stuff off to a network drive, etc?

This file will self-destruct in five minutes.

Working...