From http://www.triatechnology.com/encrypted-incremental-backups-to-amazon-glacier-in-linux/ with some corrections and improvements.
Amazon’s Glacier storage service provides very cheap storage. As of this writing, it is $.011 per GB. However some transfer fees apply so make sure you are aware of what it costs to transfer a large amount of files. Glacier is designed to use offline storage and to restore can take 3 to 5 hours to process the request. It is perfect for backups that you think you will rarely have to touch.
I wanted to use this but I had 3 main requirements:
- I wanted it encrypted with open standards and I wanted nobody but me to have the keys: Trust No One!
- I wanted the ability to do incremental backups because I have many GB of data I want to upload
- The solution must work in Linux since that is where my files are stored
Incremental backups with Glacier can be difficult because to do incremental backups you have to know what is already backed up and you can’t know what is in Glacier without waiting 3 to 5 hours. And if the backed up files are encrypted you can’t easily determine if they are the same files or not. Alternately, you can record what you have backed on the local computer with a flat file or database. My backup script uses a flat file to record what has been backed up along with the modify time and file size. The script will be pasted at the bottom of this post. This script will upload your files to S3 and then you can set the Lifecycle on your Bucket to archive it to Glacier after 1 day. If you need to restore from Glacier to S3 so you can restore to your computer, you can right click the object in the web interface and Initiate Restore. I was not able to do that on a folder, but I found a Windows program called S3 Browser that would allow you to initiate a restore on a folder. Since restoring will be a rare occurrence that will work for now. I need to test if it works in Wine.
The only thing you should need to install is s3tools so you can use the s3cmd
command. It is in the repositories of most major distros. You will need to run s3cmd --configure
so you can generate a .s3cfg
file to store your Amazon keys as well as your encryption key. You can move the .s3cfg
file to a safe place if you want to protect your encryption key. You will have to specify its location in the script. Also, you need to have your encryption key recorded in a different place. It can’t be recovered for you, so don’t lose it!
You need to use the SOURCE
array to list what directories you want backed up. The find command will then go through each directory and pull out the files. A sha1 hash is made of the file name, date and file size. Once the file is uploaded to S3 that info is written to the log file. The log file needs to always stay on the computer because it knows what has been backed up. The script searches for the hash instead of a file because I didn’t want to have collisions with similar file names such as /path/to/file
and /another/path/to/file
. I suppose I could have a collision with the hash but that would be extremely rare.
Besides the SOURCE
variable, you will also need to specify $logFile
which is your flat file with your uploaded file info. $bucket
is your bucket name in S3 and $s3cfg
is the location of your config file for s3cmd
.
s3cmd
does not give any exit status so scripting for it is difficult. I capture any output to stderr
and save it to a variable. If the variable has any output, then it considers the upload failed. However I do check to see if a file was uploaded and if the time stamp is within two minutes of the current time (allowance for variance in remote and local time) then it will be considered successful. The event will be logged in the error log. If there is no file or it does not have a time stamp within the last 2 minutes, stderr is sent to the error log and it has been considered failed.
s3cmd
uses the -e
switch to encrypt your files. It uses gpg’s (or pgp’s) symmetric key encryption of CAST5 (CAST-128) which is also RFC 2144. If for some reason s3cmd
will not work on my system, I am using an open standard for encryption and I know I can use other programs to un-encrypt my Wedding and baby photos and have tested this.
The script also uploads the backup log to S3 for safe keeping as well as this backup script in case it is not listed in your list of files to backup.