Here are some valuable thoughts, I'll re-write something more synthetic later...
From digitalhermit (113459):
I deal with some aggregate 2
terabytes of storage on my home file servers. What works for me won't
work for an enterprise corporate data center, but maybe some things are
I think the article does a good job of explaining how to backup, but maybe just as important is "why?". There are some posts that say put everything on a RAID or use mirror or dd. What they fail to address is one important reason to backup: human error. You may wipe a file and then a week later need to recover it. If all you're doing is mirroring or RAID, no matter how reliable, your backups are worthless.
There's also different classes of data. I have gigabytes of videos. Some are transcoded DVDs, some are raw footage. If I lose all my transcoded DVDs it's not as critical as if I lost raw footage. Why? The DVDs can be re-ripped. It will take a long time but the data can be recreated. For the raw footage it's different, even if I keep the original Mini-DV tapes, because re-recording the video from tape won't guarantee that the file is identical. If the file is different then the edits will be different. Then there's also mail spools, CVS, personal files, etc..
What I've found is that I archive my DVD rips once every few months. Other stuff is backed up once a week to another file server.
I could care less about the OS. THe file server runs FedoraCore5. The only thing I keep is the Kickstart file so that I can rebuild it within a matter of minutes then restore the data from archives. This is just a matter of copying a samba configuration and restarting.
For the web server, all content is kept within CVS. If the web server fails, it's just a matter of rebuilding the image and pulling the latest copy from CVS. Fifteen minutes to re-image the OS. Five minutes to pull down the latest content.
For DNS, initial configuration for 8 domains is done by a perl script that auto-creates the named.conf and all zone files. Then I just append the host list to the primary domain. Ten minutes at most.
Home directories are centralized on a file server using OpenLDAP and automounts. One filesystem to backup makes it easy.. By being easy it means it gets done automatically.
Other "machines" are virtual and these are copied to DVD whenever something drastic changes (e.g., major upgrade).
From swordgeek (112599):
When you work in a large
environment, you start to develop a different idea about backups.
Strangely enough, most of these ideas work remarkably well on a small
scale as well.
tar, gtar, dd, cp, etc. are not backup programs. These are file or filesystem copy programs. Backups are a different kettle of fish entirely.
Amanda is a pretty good option. There are many others. The tool really isn't that important other than that (a) it maintains a catalog, and (b) it provides comprehensive enough scheduling for your needs.
The schedule is key. Deciding what needs to get backed up, when it needs to get backed up, how big of a failure window you can tolerate, and such is the real trick. It can be insanely difficult when you have a hundred machines with different needs, but fundamentally, a few rules apply to backups:
1) Back up the OS routinely.
2) Back up the data obsessively.
3) Document your systems carefully.
4) TEST your backups!!!
1) Don't restore machines--rebuild.
2) Restore necessary config files.
3) Restore data.
4) TEST your restoration.
All machines should have their basic network and system config documented. If a machine is a web server, that fact should be added to the documentation but the actual web configuration should be restored from OS backups. Build the machine, create the basic configuration, restore the specific configuration, recover the data, verify everything. It's not backups, it's not a tool, it's not just spinning tape; it's the process and the documentation and the testing.