The Search for a Simple Online Backup Service

Journal PhillC's Journal: The Search for a Simple Online Backup Service

Journal by PhillC on Monday November 12, 2007 @07:58AM

Before starting this entry, I should note that Kapital Moto TV is currently just a two man operation. I'm the business and online video brains, my partner Antoine is the developer and server admin. Sometimes, depending on time and availability the lines are blurred. Any simple Linux errors in the processes described below are through my own ignorance and lack of knowledge.

It's probably gross negligence, but we haven't really had any sort of automated backup schedule in place for the Kapital Moto TV website. Video files served via the website are "backed up" by a local copy on my home PC. Website code is "backed up" by local copies on Antoine's and my home PCs. These may or may not be the latest code versions. The rule we have in place is that before working on any live code, we download the file from the website first. In this way, we'll always be working on the most recent version. MySQL database "backups" are handled by me remembering to download an extract via PHPMyAdmin whenever I think of it.

Overall, a pretty crap backup regime. So, I decided to do something about it. I wanted automated, online and easy to restore. Perhaps easier said than done, but it didn't need to be.

After much research, and reading this Slashdot article from about a year ago, I decided to use Amazon's S3 service. This seemed to be the cheapest storage available, they've just launched the service in Europe and in my mind it seemed like a pretty safe bet, not likely to disappear anytime soon.

The only problem I could see with S3 is that it's difficult to connect directly to the service. Either, you need to write your own application using their APIs, or connect through a third party service. I wasn't about to write my own application, so the two most likely third party services were Jungledisk or S3sync.

I decided to try Jungledisk first. It all seemed pretty straight forward to begin with. I followed what appeared to be an easy online tutorial, installed Davfs2, mounted S3 as a new disk locally, then attempted to use rsync for copying across data. Initially it seemed to work fine on my couple of small test files. However, when running a "live" test on the 10GB of data that needs backing up, rsync threw errors. Initially I suspected a Jungledisk install problem, but that didn't seem to be the case. Unfortunately, the Jungledisk support forums provided very little help. One post from "JungleDave" helped troubleshoot a Jungledisk startup issue, but at the time of writing there's been no further responses for four days. Another user with similar problems also doesn't seem to be getting the answers they need. For what is ultimately a commercial tool, the level of support provided is pretty minimal and disappointing.

So, unmount, uninstall and off to S3Sync I went. S3Sync is a Ruby script that works similar to the standard rsync but interfaces with Amazon's S3 as the storage medium. I didn't have Ruby installed on the server, so that was the first step. Following another online tutorial for getting S3Sync up and running all seemed to be going well. Appearances can be deceptive and problems did strike. At the time of writing S3Sync hasn't been updated to work with the S3 European storage. The KMTV server is located in the UK, and while I wanted the backups held remotely, for a decent transfer speed I was only really considering European based storage options. It looks like I was stuck again. I can't criticise the guys at S3Sync, but I needed a backup regime in place. The more I worked on getting one up and running, the more paranoid I became about disaster striking. I had to get something sorted as soon as possible.

What's left? Perhaps what I should have done to start with! Find a service that allows direct rsync connections from my server. A good few hours online reasearch narrowed the options down to the following contenders:

Strongspace - part of Joyent so not likely to go away anytime soon. However, at US$15 for 5GB they aren't the cheapest.
Gigaserver - European based, but again the price isn't right - EUR25 for 5GB
Rsync.net - a favourite with the Slashdot crowd in the earlier discussion I linked to. The price is right and they offer European storage in Switzerland. Sadly, a much more recent blog post elsewhere didn't have a lot of good things to say about this service.

I'm sure there are more options out there and I could have spent days researching all the relevant services. In the end I needed to get something up and running quickly. I went with Rsync.net, signed up for 12GB of storage, at US$19.20 per month it's not as cheap as S3, but won't break the bank either.

After signing up, it took a couple of hours for my login details to come through, but this was flagged during the sign-up process. The instructions in the initial email were extremely helpful. Connecting via SSH, using certificates and thus not needing to input a password each time, was explained in straightforward terms. A few simple rsync tests worked a treat. Everything looking pretty good. It was time for the big test. Using rsync I started to copy across the 10GB of video files. Left to run overnight, the morning logs showed no errors. A few more rsync tests with small test files dumped in the video directory also worked correctly, with only the new files being copied across. It looks like Rsync.net was ultimately the right choice.

My backup regime now looks something like this:

Step 1.

mysqldump used to extract copies of the relevant databases

mysqldump -u user --password=XXXXXXXXXX --opt --databases kmtv scottish | gzip > /user/mysqldumps/mysqlbackup.sql.gz

Insert your own "user" instead. Insert the relevant password to access the databases on your server.

This command will extract dumps of the databases kmtv and scottish (another project) to the folder /user/mysqldumps as a GZip file.

I created this as a simple shell script:

#!/bin/bash

mysqldump -u user --password=XXXXXXXXXX --opt --databases kmtv scottish | gzip > /user/mysqldumps/mysqlbackup.sql.gz

Saved in a folder under my user's directory.

Step 2.

Use rsync to copy relevant data across to the rsync.net server. Again I created these as a shell script, after testing them on the command line.

#!/bin/bash

rsync -aHvz /user/mysqldumps/mysqlbackup.sql.gz rsyncdotnetuser@servername.rsync.net:mysqldumps

rsync -aHvz /www/kmtv rsyncdotnetuser@servername.rsync.net:kmtv

Again you'll need to insert your own rsync.net username and allocated servername in the above lines.

If you can't get your shell scripts to work correctly, check the permissions. Chmod 755 worked for me.

Step 3.

Setup crontab to handle the transfer in an automated manner. I use crontab -e to make these necessary changes.

# Every morning at 3.00am run mysqldump backups
0 3 * * * /user/scripts/mysqldump.sh

# Every morning at 3.15am run rsync backups to rsync.net
15 3 * * * /user/scripts/rsync.sh

Insert your own correct path to the location of your shell scripts.

And that's it! There are other places on the Net to find rsync, mysqldump and crontab tutorials if you need help with that.

I'll probably add to and fine tune the above over time.

Feedback from those more knowledgeable than myself is always appreciated.

And now, with this post, I have a record of what I've done for future reference.

This discussion has been archived. No new comments can be posted.