How I Use Borg: Scripts & systemd - 19 July 2017

System backup is probably the most important of the "first things" a Linux-user will learn how to do for themselves. Autodidactism is the basic condition of the Linux enthusiast, and with respect to making good system backups there is a lot to learn, or at least look at and forget. There are dozens of choices of 'easy to use' systems, which have varying sizes of user communities and project activity. There are also unlimited 'roll-your-own' choices, being that few of the simpler tools (which often have a GUI) are as featureful or flexible or performant enough to satisfy a majority of Linux users. In this post I'm going to run through my setup which uses borg and some systemd units to have a backup system which is fast, featureful, and unobtrusive.

For many years I had used Deja-Dup, the program which ships as `Backups` in the Gnome or Unity desktop environments. It's a pretty solid program, basically a graphical front-end for duplicity and a wizard to get your backup storage setup. It's very straight forward to configure, mount your backup, access files from a certain point in the past, or perform a whole system restore. For most people in most use cases, it's fast enough, configurable enough, and robust enough. Obviously, I found some limitations eventually. Most of my data that I care about is extremely uncompressable (and I like it that way) and most of the files are very large. Deja-Dup uses xz compression, which is quite slow. System scans to detect file changes began to take a little while, and then backing up those changes (calculating the file deltas and writing them to the disk) was taking a very long time. At some point I had a corruption in the backup, and needed to make it from fresh (Deja-Dup does this on its own just to freshen the bits after a while itself). The rebuild wouldn't complete, even left for days. Attempting to blacklist the large binary of the lxd-zfs backing file (+100 GiB) where it would hang, I discovered that Deja-Dup only blacklists directories. Moving the file into its own folder broke my LXD setup, and I didn't like that.

My borg Config

Then began my search for a superior backup solution, landing eventually on borg. It has a lot of advantages, first (for me) being that blacklisting is full-on globbing and regex. It has deduplication, consistency checking, encryption, is configurable through environment variables, recovering files is easy with FUSE mounting the repos, and other stuff. All the nice features, with none of the hassle, and it's very fast. Using borg prune I can have a high frequency of backups with a cascade of granularity and constrain repo growth: 6 hourly, 7 daily, 4 weekly, and 6 monthly. I wrote a script file to roll up all this stuff to automate everything, and yes I know that putting plaintext passwords in scripts isn't safe but I'm not concerned about encrypting my backups.

borg-backup.sh

#!/usr/bin/env bash
# the envvar $REPONAME is something you should just hardcode
export REPOSITORY="/media/$USER/$REPONAME/borg/" 

# Fill in your password here, borg picks it up automatically
export BORG_PASSPHRASE="" 

# Backup all of /home except a few excluded directories and files
borg create -v --stats --compression lz4                 \
    $REPOSITORY::'{hostname}-{now:%Y-%m-%d@%H:%M}' /home \
--exclude '/home/*/.cache'                               \
--exclude '/home/*/.ccache'                              \
--exclude '/home/$USER/.local/include'                   \
--exclude '/home/$USER/.local/installppa.sh'             \
--exclude '/home/$USER/.local/listppa'                   \
--exclude '/home/$USER/Downloads'                        \
--exclude '/home/$USER/VirtualBox\ VMs'                  \
--exclude '/home/$USER/lxd-zfs.img'                      \
--exclude '/home/lost+found'                             \
--exclude '*.img'                                        \
--exclude '*.iso'                                        \

# Route the normal process logging to journalctl
2>&1

# If there is an error backing up, reset password envvar and exit
if [ "$?" = "1" ] ; then
    export BORG_PASSPHRASE=""
    exit 1
fi
 
# Prune the repo of extra backups
borg prune -v $REPOSITORY --prefix '{hostname}-'         \
    --keep-hourly=6                                      \
    --keep-daily=7                                       \
    --keep-weekly=4                                      \
    --keep-monthly=6                                     \
 
# Include the remaining device capacity in the log
df -hl | grep --color=never /dev/sdc
 
borg list $REPOSITORY
 
# Unset the password
export BORG_PASSPHRASE=""
exit 0

As you can see from this file, configuration is very simple, excludes are easy to ennumerate via any method you please, the world is a wonderful place. I stash scripts and other tools like this in my $HOME/.local/bin. The observant reader will have seen that I also specify LZ4 compression for cheap-as-free space savings, relying pricipally on the de-duping for most of the storage space savings.

My systemd Setup

Automation then, cron right? Please, this is 2017 and you're not one of those ultra-nerds that uses a distro that doesn't include systemd, you're just a regular nerd that uses Linux. To automate basically anything, you need to create an anything.service file and put it in the /etc/systemd/system/ folder. To have your anything execute regularly (because it's not a daemon process) an anything.timer file is all that's needed, in the same folder with your anything.service file. The filename before the suffix (".service" and ".timer") needs to match for systemd to recognize what it needs to do. systemd units can do way, way, way, waaay more than this with units for network, path, mount, and other things, but I haven't had the patience to play with them yet. These are the unit files I use to run borg-backup.sh:

borg-backup.service

[Unit]
Description=Borg User Backup
 
[Service]
Type=simple
Nice=19
IOSchedulingClass=2
IOSchedulingPriority=7
ExecStartPre=/usr/bin/borg break-lock $REPOSITORY
ExecStart=/home/$USER/.local/bin/borg-backup.sh
User=$USER
Group=$GROUP
borg-backup.timer

[Unit]
Description=Borg User Backup Timer
 
[Timer]
WakeSystem=false
OnCalendar=*-*-* 0/1:00:00
RandomizedDelaySec=10min
 
[Install]
WantedBy=timers.target

I hardcode my $USER:$GROUP (or is it $GROUP:$USER?) values into the borg-backup.service file to ensure that my normal user has ownership over the backup and the files inside. Running the service as default would put root:root ownership on the backup, making a mess of things. Putting these files into /etc/systemd/system/ along with everything else, and running $ sudo systemctl enable borg-backup will get things going! You can check for when your next backup will fire off using $ systemctl list-timers, and the status of your backup system with $ systemctl status borg-backup.

The observant will again have seen that there's an ExecStartPre= directive, which fires before the backup script. That's because if the backup, running silently in the background, is rudely interrupted by me rebooting my computer it will leave the write lock in place. All that requires to run a new backup is a quick $ borg break-lock $REPOSITORY. So, I include the directive and the one-liner rather than an external script that handles an error, because it's a safe operation to run if the repository only has one client backing up to it (me). Obviously, this inelegant solution will not suit all situations. Now, let's see what the output (and performance) of this backup system is like!

$ journalctl -u borg-backup

Archive name: home-2017-07-19 15:38
Archive fingerprint: 5b2e54cf3be1879113a6aac24ef7123ec96b9e2ce386b11bf4e8b5b41fca9424
Time (start): Wed, 2017-07-19 15:38:40
Time (end):   Wed, 2017-07-19 15:40:09
Duration: 1 minutes 28.87 seconds
Number of files: 710113
------------------------------------------------------------------------------
                       Original size      Compressed size    Deduplicated size
This archive:              776.95 GB            769.53 GB             85.40 MB
All archives:               18.94 TB             18.79 TB            757.03 GB
                       Unique chunks         Total chunks
Chunk index:                  743011             24877995
------------------------------------------------------------------------------
/dev/sdc1       2.7T  706G  1.9T  28% /media/andy/Backup'16
home-folder                          Tue, 2016-11-15 00:16:18
home-2016-11-21 16:40.checkpoint     Mon, 2016-11-21 16:40:25
home-2016-12-31 18:06                Sat, 2016-12-31 18:06:44
home-2017-01-31 20:19                Tue, 2017-01-31 20:19:59
home-2017-02-28 11:04                Tue, 2017-02-28 11:04:19
home-2017-03-15 14:07                Wed, 2017-03-15 14:07:51
home-2017-04-30 23:08                Sun, 2017-04-30 23:08:20
home-2017-05-31 23:04                Wed, 2017-05-31 23:04:51
home-2017-06-11 20:09                Sun, 2017-06-11 20:09:28
home-2017-06-14 09:06                Wed, 2017-06-14 09:06:59
home-2017-06-22 18:00                Thu, 2017-06-22 18:00:31
home-2017-06-29 20:02                Thu, 2017-06-29 20:02:55
home-2017-07-03 23:03                Mon, 2017-07-03 23:03:47
home-2017-07-04 21:07                Tue, 2017-07-04 21:07:35
home-2017-07-06 20:06                Thu, 2017-07-06 20:06:04
home-2017-07-08 18:07                Sat, 2017-07-08 18:07:21
home-2017-07-10 23:09                Mon, 2017-07-10 23:09:16
home-2017-07-11 18:03                Tue, 2017-07-11 18:03:24
home-2017-07-12 19:06                Wed, 2017-07-12 19:06:40
home-2017-07-13 18:05                Thu, 2017-07-13 18:05:46
home-2017-07-13 19:07                Thu, 2017-07-13 19:07:19
home-2017-07-17 14:06                Mon, 2017-07-17 14:06:30
home-2017-07-17 15:07                Mon, 2017-07-17 15:07:59
home-2017-07-17 16:07                Mon, 2017-07-17 16:07:03
home-2017-07-19 15:38                Wed, 2017-07-19 15:38:40

The compression performance is pretty decent, though the 50G's of supposedly extra savings is actually the accounting difference between gibibytes and gigabytes. It's a bit annoying looking at the output from borg list having the double dates, but it's actually very nice to have all of the backups sorted in date order when you're browsing the top level directory looking for something. Have to say I've been pretty happy with just how smooth this system has been, I don't even notice it happen anymore, but I know it's there keeping my work safe.