Figure 1: John from USA - CC-BY-2.0

Figure 1: John from USA - CC-BY-2.0

Watch out, things break, stuff catches fire. Let’s talk about backups.

Last post, I stated that I’m going to switch focus away from NixOS commentary. This is still the plan. Today, I am still committed to NixOS thanks to technical debt created - migrations aren’t for free. Until then, enjoy my NixOS posting :).

Last fall, I wanted to reformat my laptop’s NixOS deployment from BTRFS (encased within LVM2 itself encased in LUKS) to a ZFS partition plus another swap partition. My Nix install is comprised of a few artifacts:

  1. My git repository with the flake.nix and flake.lock files
  2. The workstation’s /secrets folder, sensitive data for service accounts.
  3. The workstation’s /home folder

Both /secrets and /home are backed up via borgmatic (using borg) on a nightly basis via a crufty old nixos module that I wrote (example of usage). Both folders were also snapshotted by BTRBK every 15 minutes (via this nixos configuration). This frequent snapshotting policy will continue on the ZFS reinstall powered by zfs-autosnapshot.

The first test was to verify the integrity of the backed up artifacts. I was able to execute a full restore from backup from within a virtual machine. This included adapting my laptop’s flake configuration to the VM, rebuilding, then executing the borg extract commands.

Fun fact: borg mount and rsync is several times slower than running borg extract (using BorgBase). Keep that in mind when executing restores - if you need a full restore or a restore of a subdirectory, consider borg extract. If you need to pick and choose many files, consider borg mount.

After the successful test restore, it was time to execute a final backup. On my setup that’s as simple as systemctl start backup. Then boot a NixOS installer. Invoking parted /dev/nvme0n1, I came up with the following partition layout:

Model: INTEL SSDPEKNU010TZ (nvme)
Disk /dev/nvme0n1: 1024GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Disk Flags:

Number  Start   End     Size    File system  Name  Flags
 1      1049kB  1000MB  999MB   fat32        boot  boot, esp
 2      1000MB  1007GB  1006GB               zfs
 3      1007GB  1023GB  16.2GB               swap

The swap partition is used in conjunction with NixOS’s swapDevices.*.randomEncryption.enable setting. This swap partition is encrypted using LUKS. This encrypted device mapper device is used for swap.

Then I followed my standard install instructions here on this blog.

§Recovery strategy

As part of this restore procedure, I tested my restore strategy. It turned out the thumb drive and QR code with the full disk encryption (FDE) key for the thumb drive were compromised. They simply didn’t work - the QR code was of a different key.

Had I needed to recover my setup in a data loss scenario, it is likely I would had lost data due to not having access to recovery material. I was at risk of data loss. Ooop!

Figure 2: Alex E. Proimos - CC-BY-2.0

Figure 2: Alex E. Proimos - CC-BY-2.0

Next I created new recovery material. It consisted of two components: A passphrase and an encrypted thumb drive. They live together; the passphrase is more of a “are you sure you want to open this?” than a security measure. The encrypted thumb drive contains my PGP private keys (encrypted with each key’s own passphrase) and password database encrypted against my “private use only” key.

In order to restore from this media, first open the LUKS container via cryptsetup luksOpen /dev/disk/by-id/usb-...-part0 usbCrypt then mount it via mount /dev/mapper/usbCrypt /mnt/usb. I can load the GPG keys into my gpg then run gpg --decrypt --output - /mnt/usb/password-store/backups/stargate/passphrase to get the backup borg storage passphrase. I can then set up borg to access my backup via accessing my backup service’s dashboard.

Finally I was able to run borg extract ... to kick off the restore on the laptop.

From start to finish the redeploy and restore took 3 hours for the data restore and another hour due to various tasks of how this procedure works. It’s not super automatic, but hey, it’s tested and it works!

§Well I guess the restore worked!

Figure 3: Data loss is possible with a sledge hammer and no backups

Figure 3: Data loss is possible with a sledge hammer and no backups

Test your backups. Until you do, they are but a speculative investment; you’re not sure if they work. In theory they say they should, however, who really knows. Nobody really knows. Go test your backups. Haven’t done it yet, well, then, buy this sledge hammer and apply it to your storage devices, because your data is worthless without tested backups - it could disappear at any time. Theft, fire, PEBKAC, Software Bug (like Steam’s infamous rm -rf script bug)… anything is possible.

§Want a T-shirt?

I’m selling T-shirts with that sledge hammer fellow on the back and “COMPUTERS WERE A MISTAKE” on the front. Of course it’s supposed to be cheeky and not too serious - we must laugh at technology before it destroys our human identity. And embrace the good parts. Computers are fun, if you let ’em. Have fun with ’em, but don’t let ’em control every aspect of your being.