Saving Space With RSYNC and Hard Links

Linux LogoYou’re supposed to back up what you don’t want to lose. But the process can be consuming in both time and storage space. That is, if you’re not using Linux.

With my Windows background still showing through, I thought the best way to do backups was the tradition full backup followed up by several incremental backups. The drawback to this method is that a restoration will include files that were deleted after the last full backup and multiple copies of any file that was since moved. But hey, at least you got your files back.

Linux has a nice little trick up its sleeve called hard links. By actually giving a particular file two different names (yes, that is a very short description of how it works — here are the full details) you can save a lot of space.

Rsync includes a handy option called link-dest(ination). Normally rsync will only compare files between the source and target directories. By using the –link-dest option, it will compare them to that directory as well and create a hard link in the target directory if it finds a match between the source and link-dest directories.

Thus you can use:

cp -av $source `date +%F`

on the initial backup, which will give you a backup to a directory in the yyyy-mm-dd format. The insistence on the date format is simple: alphabetical order is now also chronological order.

From there on, you can use

/usr/bin/rsync -a --delete --link-dest=`ls -1r|head -1` $source `date +\%F`

on all subsequent backups.

The ls -1r part gives you a listing 1 file per line in reverse alphabetical order. Piping this output to head -1 strips the output to the first line only. This combination gives us the directory you used to make your last backup.

UPDATE APR 20 2011: I missed a couple of key points in my initial post, which is worth mentioning here.

  1. Cron needs full paths. I got the part about specifying /backup correct but failed on /usr/bin/rsync, thinking cron would have something in %PATH%. It doesn’t.
  2. The % is a special character indicating a newline character. If you need to specify % then you have to escape it using \%.

Also worth mentioning:

  1. You can append > [file] to have any error messages logged.