My Backup Strategy
Details about the backup system for all of my data.Thu 08 April 2021
Regularly backing up all the data I care about is very important to me. This article outlines my strategy to make sure I never lose essential data.
Backups should be as automatic as possible. This ensures laziness and forgetfulness won't interfere with the regularity.
All software used to create and store the backups should be free and open source so I'm not depending on the survival of a company.
Backups need to be tested to ensure they are correct and happening regularly. Multiple copies of the backups should exist, including at least one offsite to protect against my building burning down.
Backups should also be incremental when possible (rather than mirror copies) so an accidental deletion isn't propagated into the backups, making the file irrecoverable.
I have one backup folder
/mnt/backup on my media server at home that serves as the destination for all my backup sources. All scheduled automatic backups write to their own subfolder inside of it.
This backup folder is then synced to encrypted 2.5" 1 TB hard drives which I rotate between my bag, offsite, and my parents' house.
I use the tool
rdiff-backup extensively because it allows me to take incremental backups locally or over SSH. It acts very similar to
rsync and has no configuration.
I have every email since 2010 backed up continuously in case my email provider disappears.
offlineimap to sync my mail to the directory
rdiff-backup from a weekly cron job:
I'll explain what backup_check.txt does below
*/15 * * * * offlineimap > /var/log/offlineimap.log 2>&1 00 12 * * 1 date -Iseconds > /home/email/email/backup_check.txt 20 12 * * 1 rdiff-backup /home/email/email /mnt/backup/local/email/ 40 12 * * 1 rdiff-backup --remove-older-than 12B --force /mnt/backup/local/email/
.offlineimaprc for reference:
[general] accounts = main [Account main] localrepository = Local remoterepository = Remote [Repository Local] type = Maildir localfolders = ~/email [Repository Remote] type = IMAP readonly = True folderfilter = lambda foldername: foldername not in ['Trash', 'Spam', 'Drafts'] remotehost = example.com remoteuser = email@example.com remotepass = supersecret sslcacertfile = /etc/ssl/certs/ca-certificates.crt
I use Standard Notes to take notes and wrote the tool standardnotes-fs to mount my notes as a file system to view and edit them as plain text files.
I take weekly backups of the mounted file system on my media server with cron:
00 12 * * 1 date -Iseconds > /home/notes/notes/backup_check.txt 15 12 * * 1 rdiff-backup /home/notes/notes /mnt/backup/local/notes/
I self-host a Nextcloud instance to store all my personal documents (non-code projects, tax forms, spreadsheets, etc.). Since it's only a syncing software, the files need to be copied elsewhere to be backed up.
I take weekly backups of the Nextcloud data folder with cron:
00 12 * * 1 rdiff-backup /var/www/nextcloud/data/tanner/files /mnt/backup/local/nextcloud/ 30 12 * * 1 rdiff-backup --remove-older-than 12B --force /mnt/backup/local/nextcloud/
I self-host a Gitea instance to store all my git repositories for code-based projects. My home folder is also a git repo so I can easily sync my config files and password database between servers and machines.
I take weekly backups of the Gitea data folder with cron:
00 12 * * 1 date -Iseconds > /home/gitea/gitea/data/backup_check.txt 10 12 * * 1 rdiff-backup --exclude **data/indexers --exclude **data/sessions /home/gitea/gitea/data /mnt/backup/local/gitea/ 35 12 * * 1 rdiff-backup --remove-older-than 12B --force /mnt/backup/local/gitea/
Telegram Messenger is my main app for communication. My parents, most of my friends, and friend groups are on there so I don't want to lose those messages in case Telegram disappears or my account gets banned.
Saves the messages to a sqlite db
Telegram includes a data export feature, but it can't be automated. Instead I run the deprecated software telegram-export hourly with cron:
0 * * * * bash -c 'timeout 50m /home/tanner/opt/telegram-export/env/bin/python -m telegram_export' > /var/log/telegramexport.log 2>&1
It likes to hang, so
timeout kills it if it's still running after 50 minutes. Hasn't corrupted the database yet.
I mount my phone's internal storage as a file system on my desktop using adbfs-rootless. I then rsync the files over to my media server:
$ ./adbfs ~/mntphone $ time rsync -Wav \ --exclude '*cache' --exclude nobackup \ --exclude '*thumb*' --exclude 'Telegram *' \ --exclude 'collection.media' \ --exclude 'org.thunderdog.challegram' \ --exclude '.trashed-*' --exclude '.pending-*' \ ~/mntphone/storage/emulated/0/ \ localmediaserver:/mnt/backup/files/phone/
Unfortunately this is a manual process because I need to plug my phone in each time. Ideally it would happen automatically while I'm asleep and the phone is charging.
/backup/files is a repository for any kind of files I want to keep forever. My phone data, old archives, computer files, Minecraft worlds, files from previous jobs, and so on.
All the files will be included in the 1 TB hard drive backup rotations.
rdiff-backup on the remote server with cron:
00 14 * * * date -Iseconds > /home/tanner/tbot/t0txt/data/backup_check.txt 04 14 * * * rdiff-backup /home/tanner/tbot/t0txt/data tbotbak@remotebackup::/mnt/backup/remote/tbotbak/daily/t0txt/ 14 14 * * * rdiff-backup --remove-older-than 12B --force tbotbak@remotebackup::/mnt/backup/remote/tbotbak/daily/t0txt/ 24 14 * * 1 rdiff-backup /home/tanner/tbot/t0txt/data tbotbak@remotebackup::/mnt/backup/remote/tbotbak/weekly/t0txt/ 34 14 * * 1 rdiff-backup --remove-older-than 12B --force tbotbak@remotebackup::/mnt/backup/remote/tbotbak/weekly/t0txt/ 44 14 1 * * rdiff-backup /home/tanner/tbot/t0txt/data tbotbak@remotebackup::/mnt/backup/remote/tbotbak/monthly/t0txt/ 55 14 1 * * rdiff-backup --remove-older-than 12B --force tbotbak@remotebackup::/mnt/backup/remote/tbotbak/monthly/t0txt/
tbotbak user has write access to the
/mnt/backup/remote/tbotbak directory only. It has its own passwordless SSH key that's only permitted to run the
rdiff-backup --server command for security.
I run a lot of services for Protospace, my city's makerspace.
The member portal I wrote called Spaceport creates an archive I download daily:
40 10 * * * wget --content-disposition \ --header="Authorization: secretkeygoeshere" \ --directory-prefix /mnt/backup/remote/portalbak/ \ --no-verbose --append-output=/var/log/portalbackup.log \ https://api.my.protospace.ca/backup/
The website and wiki that I sysadmin both get backed up weekly:
0 12 * * 1 mysqldump --all-databases > /var/www/dump.sql 15 12 * * 1 date -Iseconds > /var/www/backup_check.txt 20 12 * * 1 rdiff-backup /var/www pshostbak@remotebackup::/mnt/backup/remote/pshostbak/weekly/www/
The Protospace Minecraft server I run gets backed up daily:
00 15 * * * date -Iseconds > /home/tanner/minecraft/backup_check.txt 00 15 * * * rdiff-backup --exclude **CoreProtect --exclude **dynmap /home/tanner/minecraft psminebak@remotebackup::/mnt/backup/remote/psminebak/ 30 15 * * * rdiff-backup --remove-older-than 12B --force psminebak@remotebackup::/mnt/backup/remote/psminebak/
I also back up our Google Drive with rclone:
45 12 * * 1 rclone copy -v protospace: /mnt/backup/files/protospace/google-drive/
My backup folder
/mnt/backup now looks like this:
/mnt/backup/ ├── files │ ├── docs │ ├── phone │ ├── protospace │ ├── telegram │ ├── usbsticks │ └── ... and so on ├── local │ ├── email │ ├── gitea │ ├── nextcloud │ └── notes └── remote ├── portalbak ├── pshostbak ├── psminebak ├── tbotbak └── telebak
This directory tree is the master backup and I make a copy of the entire tree every Saturday to a hard drive.
The directory is copied over with the following script:
#!/bin/bash cryptsetup luksOpen /dev/sdf external mount /dev/mapper/external /mnt/external time rsync -av --delete /mnt/backup/local/ /mnt/external/backup/local/ time rsync -av --delete /mnt/backup/remote/ /mnt/external/backup/remote/ time rdiff-backup --force -v5 /mnt/backup/files/ /mnt/external/backup/files/ python3 /home/tanner/scripts/checkbackup.py umount /mnt/external cryptsetup luksClose external
I wrote a Python script
checkbackup.py that goes through each backup and compares the timestamp in
backup_check.txt files to the current time. This makes sure that the cron ran, backups were taken, and transferred over correctly.
I rotate through 2.5" 1 TB hard drives each Saturday when I do a backup. They are quite cheap at $65 CAD each so I can have a bunch floating around.
I keep one connected to the server, one in my bag, one offsite, one at my mother's house, and one at my dad's house. Every Saturday I run the script above to take a copy and then swap the drive with the one in my bag. It then gets I go back home about twice per year swapped when I visit my offsite location. Same for when I visit my parents. This means that all hard drives eventually get rotated through with new data and don't sit too long unpowered.
The drives are all encrypted with full-disk LUKS encryption using a password I'm unlikely to forget.
I run the check-summing
btrfs file system on them in RAID-1 to protect against bitrot. This means I can only use 0.5 TB of storage for my backups, but the data is stored redundantly.
Here's how I set up new hard drives to do this:
$ sudo cryptsetup luksOpen /dev/sdf external $ sudo mkfs.btrfs -f -m dup -d dup /dev/mapper/external $ sudo mount /dev/mapper/external /mnt/external/ $ sudo mkdir /mnt/external/backup $ sudo chown -R tanner:tanner /mnt/external/backup $ sudo umount /mnt/external $ sudo cryptsetup luksClose external
I'm working on a system to automatically back up all my home directories to my media server. I need this to grab Bash histories and code that's work-in-progress. I've been burned by not having this once when a server died.
I'd like to automate backing up my phone by connecting it to a Raspberry Pi when I go to sleep.
I need to get better at fully testing my backups by restoring them on a blank machine.