Figure 1: John from USA - CC-BY-2.0

Figure 1: John from USA - CC-BY-2.0

Watch out, things break, stuff catches fire. Let’s talk about backups.

Last post, I stated that I’m going to switch focus away from NixOS commentary. This is still the plan. Today, I am still committed to NixOS thanks to technical debt created - migrations aren’t for free. Until then, enjoy my NixOS posting :).

Last fall, I wanted to reformat my laptop’s NixOS deployment from BTRFS (encased within LVM2 itself encased in LUKS) to a ZFS partition plus another swap partition. My Nix install is comprised of a few artifacts:

  1. My git repository with the flake.nix and flake.lock files
  2. The workstation’s /secrets folder, sensitive data for service accounts.
  3. The workstation’s /home folder

Both /secrets and /home are backed up via borgmatic (using borg) on a nightly basis via a crufty old nixos module that I wrote (example of usage). Both folders were also snapshotted by BTRBK every 15 minutes (via this nixos configuration). This frequent snapshotting policy will continue on the ZFS reinstall powered by zfs-autosnapshot.

The first test was to verify the integrity of the backed up artifacts. I was able to execute a full restore from backup from within a virtual machine. This included adapting my laptop’s flake configuration to the VM, rebuilding, then executing the borg extract commands.

Fun fact: borg mount and rsync is several times slower than running borg extract (using BorgBase). Keep that in mind when executing restores - if you need a full restore or a restore of a subdirectory, consider borg extract. If you need to pick and choose many files, consider borg mount.

After the successful test restore, it was time to execute a final backup. On my setup that’s as simple as systemctl start backup. Then boot a NixOS installer. Invoking parted /dev/nvme0n1, I came up with the following partition layout:

Model: INTEL SSDPEKNU010TZ (nvme)
Disk /dev/nvme0n1: 1024GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Disk Flags:

Number  Start   End     Size    File system  Name  Flags
 1      1049kB  1000MB  999MB   fat32        boot  boot, esp
 2      1000MB  1007GB  1006GB               zfs
 3      1007GB  1023GB  16.2GB               swap

The swap partition is used in conjunction with NixOS’s swapDevices.*.randomEncryption.enable setting. This swap partition is encrypted using LUKS. This encrypted device mapper device is used for swap.

Then I followed my standard install instructions here on this blog.

Recovery strategy

As part of this restore procedure, I tested my restore strategy. It turned out the thumb drive and QR code with the full disk encryption (FDE) key for the thumb drive were compromised. They simply didn’t work - the QR code was of a different key.

Had I needed to recover my setup in a data loss scenario, it is likely I would had lost data due to not having access to recovery material. I was at risk of data loss. Ooop!

Figure 2: Alex E. Proimos - CC-BY-2.0

Figure 2: Alex E. Proimos - CC-BY-2.0

Next I created new recovery material. It consisted of two components: A passphrase and an encrypted thumb drive. They live together; the passphrase is more of a “are you sure you want to open this?” than a security measure. The encrypted thumb drive contains my PGP private keys (encrypted with each key’s own passphrase) and password database encrypted against my “private use only” key.

In order to restore from this media, first open the LUKS container via cryptsetup luksOpen /dev/disk/by-id/usb-...-part0 usbCrypt then mount it via mount /dev/mapper/usbCrypt /mnt/usb. I can load the GPG keys into my gpg then run gpg --decrypt --output - /mnt/usb/password-store/backups/stargate/passphrase to get the backup borg storage passphrase. I can then set up borg to access my backup via accessing my backup service’s dashboard.

Finally I was able to run borg extract ... to kick off the restore on the laptop.

From start to finish the redeploy and restore took 3 hours for the data restore and another hour due to various tasks of how this procedure works. It’s not super automatic, but hey, it’s tested and it works!

Well I guess the restore worked!

Figure 3: Data loss is possible with a sledge hammer and no backups

Figure 3: Data loss is possible with a sledge hammer and no backups

Test your backups. Until you do, they are but a speculative investment; you’re not sure if they work. In theory they say they should, however, who really knows. Nobody really knows. Go test your backups. Haven’t done it yet, well, then, buy this sledge hammer and apply it to your storage devices, because your data is worthless without tested backups - it could disappear at any time. Theft, fire, PEBKAC, Software Bug (like Steam’s infamous rm -rf script bug)… anything is possible.

Want a T-shirt?

I’m selling T-shirts with that sledge hammer fellow on the back and “COMPUTERS WERE A MISTAKE” on the front. Of course it’s supposed to be cheeky and not too serious - we must laugh at technology before it destroys our human identity. And embrace the good parts. Computers are fun, if you let ’em. Have fun with ’em, but don’t let ’em control every aspect of your being.

Figure 1: The laptop that was having a bad day with NixOS 23.11

Figure 1: The laptop that was having a bad day with NixOS 23.11

More upgrade gotchas. Shucks. If everything goes well, this will be my last NixOS post. Read on to understand my frustration just a little bit more.

My main laptop is a Lenovo Ideapad Flex 5 — simple and cheap device. The keyboard stopped working in the early boot after upgrading to 23.11. The impact: I need to a USB keyboard around to unlock the device from a cold boot. Unfortunate for me, I forgot to bring one when I took the train to Chicago the other day. Instead, I had to spend a bunch of time troubleshooting.

NixOS not discovering keyboard driver?

Figure 2: User typing on keyboard to unlock disk, but no effect

Figure 2: User typing on keyboard to unlock disk, but no effect

Whenever I was typing at the console to unlock the storage’s full disk encryption, the cursor didn’t seem to blink. This suggested perhaps the keyboard wasn’t being recognized.

I discovered the cause after booting a GRML live USB that was in my bag for a different project. I spent time looking at its lsmod output, and checking various details in /dev/input/input*/device/ including the description file. I found one description file that had the contents of i8042 KBD port. This description seemingly correlated to a i8042 module loaded in the Kernel.

Then I looked at the lsmod output again, grepping for lines containing i8042. I noted there is a ideapad_laptop module which depends on i8042. Bingo, this seemed like the module to load at boot. Might be fine to load i8042 directly, however ideapad_laptop seems like the driver for this laptop’s specific hardware, so I figured why not load it all.

Got lucky, turned out an older NixOS generation seemingly worked, so I didn’t have to figure out GRML networking, somehow get ZFS set up on it, and unlock the disks in GRML. Instead I just booted the old NixOS generation, fired up emacs, ignored all the errors caused by Emacs trying to load compiled elisp objects between major releases or something of that nature. nix-mode didn’t work either but that’s fine, fundamental-mode works too I guess. I added "ideapad_laptop" to boot.kernelModules in nix flake (commit), then ran a nixos-rebuild boot --flake .#.

Rebooted and viola! I could boot into 23.11 and use my keyboard!

Lessons learned

To get into the broken NixOS install faster, I could carry on a live USB on my person. This way I can access my NixOS installation without further shenanigans. It can minimize friction caused through booting an older generation that messes with user data in frictionful ways (such as Emacs failing to load nix-mode in the older generation). GRML doesn’t ship with ZFS so it doesn’t quite meet this need.

nixos-generate-config is not infallible. It doesn’t include this i8042 driver for my laptop’s configuration and the initramfs doesn’t seem to load the driver without the above fix.

It appears I need to run NixOS unstable and test it on my machines in order to validate my installs prior to release. This is a lot of work and I’m not up for it. It doesn’t bring me value or joy. As of Jan 15, 2024, there are over 7k open issues, 5k open pull requests. That’s a lot of work blocked or waiting to be done. How do I know that my particular breakage will be addressed in any reasonable timeline? How do I know that my pull requests will be commented on in a timely manner? No promises, and therefore I am not demoralized from implementing further release testing on my end anyways.

Next steps for my computing needs

Figure 3: My unpaid part-time job trying to write blog posts

Figure 3: My unpaid part-time job trying to write blog posts

It’s becoming more and more apparent I don’t want to play computer janitor every time I go to write an email or load a webpage. To this end I don’t think I’ll be using NixOS in the future. Need something with a committed testing culture and mechanisms in place to hear community feedback and take appropriate actions to correct direction / maintain focus on a singular vision. I could write a book about all my grievances… it’s besides the point… it’s bad energy and it’s hubris to think I could even change anything about the community, time to move on.

Debian Testing combined with Ansible and Nix might be sufficient. Or Gentoo combined with Ansible and Nix could also work. Gentoo now offers pre-built binpkgs so the installation experience might be about as fast as deploying a NixOS host from scratch. Bonus, while Nix offers some built-input parameters, there isn’t standardization on what those are like Gentoo USE flags, so Gentoo wins out in package customization in a consistent way.

Upgrading is a time commitment, NixOS can’t fix this. I dread the thought of upgrading to the next release, so my migration timeline will be by the time 23.11 is EOL (in about 5 months — 30 Jun 2024… yet another frustration with NixOS).

What about you?

Does NixOS work for you? If you were to calculate the time spent per week learning how to use Nix/NixOS, fixing trivial issues that one takes for granted as fixed on most other distros, trying not to burn yourself out in nixpkgs, and asking for help but not getting any assistance… was it worth it?

Figure 1: Jamian · CC BY 3.0 Deed (link)

Figure 1: Jamian · CC BY 3.0 Deed (link)

A frequent quip of the unix-beard is shebangs cannot contain multiple command-line arguments. Let’s break it down and see where this assumption no longer holds true.

What is a Shebang?

The shebang is the line at the beginning scripts such as Python and Shell scripts that instructs the OS how to execute the script. Looks something like #!/bin/sh or #!/usr/bin/env python. This line specifies how to interpret the file. This progam specified on that line is also known as an interpreter. Without the line, the OS wouldn’t know what to do. By default, shebang-less scripts are executed with the same program as the parent process. One might experience this if saving a Perl script without a shebang, then executing it. Your shell will likely try to run it as a shell script.

This is why a shebang is essential to most scripts. Without one, there is no guarentee it will run with the correct interpreter besides manipulating the parent process to choose a specified interpreter.

Wikipedia has an excellent page on the shebang. The execve(2) manpage (try running man 2 execve) explains the shebang in further detail.

An example… in Perl?

Consider this short Perl script. It prints the first line and last line, or just the first line, or no lines. This script uses perl -n to wrap the body of the program in while(<>) { ... } loop. In short, perl -n reads each line, then executes the body. Since I need the -n argument on the command line, I tried this first:

#!/usr/bin/env perl -n
# Why won't this work :-o
print if ++$lines == 1;
print if eof() && $lines > 1;

If I save it and run it, I get the following error output:

$ /tmp/a.pl  < /etc/passwd
/usr/bin/env: ‘perl -n’: No such file or directory
/usr/bin/env: use -[v]S to pass options in shebang lines

GNU coreutils env documentation offers insight:

The -S/--split-string option enables use of multiple arguments on the first line of scripts (the shebang line, ‘#!’). … Most operating systems (e.g. GNU/Linux, BSDs) treat all text after the first space as a single argument. When using env in a script it is thus not possible to specify multiple arguments.

Alright, ok. TIL about this -S flag, neat! If I change the shebang line to #!/usr/bin/env -Sperl -n, it works:

$ /tmp/a.pl  < /etc/passwd
root:x:0:0:System administrator:/root:/run/current-system/sw/bin/bash
nobody:x:65534:65534:Unprivileged account (don't use!):/var/empty:/run/current-system/sw/bin/nologin

Viola! Passing arguments in the shebang without any funny hacks!

If I add the suggested -v flag (#!/usr/bin/env -vSperl -n), env prints helpful debug information before executing the script:

split -S:  ‘perl -n’
 into:    ‘perl’
     &    ‘-n’
executing: perl
   arg[0]= ‘perl’
   arg[1]= ‘-n’
   arg[2]= ‘/tmp/a.pl’

It’s all documented in the coreutils documentation available online or via info coreutils env in your terminal.

Where can I use this feature?

Any modern OS that ships env from coreutils including the majority of popular desktop/server Linux distros. If you’re a Mac user you’re also in luck, thanks to FreeBSD’s downstream env implementation supporting -S too.

POSIX env does not offer this -S feature. OpenBSD, NetBSD, Busybox, Toybox are also without -S. Busybox ships on some Linux distros such as Alpine. My first step when using these hosts is to install software to reduce the friction of productivity including coreutils. Coreutils is my portability strategy for env -S.

Can’t use env -S or need POSIX compliance?

If one cannot use env -S, I believe the time-tested approach is to write a wrapper program for each tool that needs specified arguments on the shebang line. With the above example, I could put this script in my $PATH and name it perlloop:

#!/bin/sh
exec perl -n "$@"

Next, I can update the shebang from the above example to #!/usr/bin/env perlloop. Then it’ll work the same

Why not just use #!/usr/bin/perlloop?

Works too if that’s where perlloop lives. You probably don’t care where it lives though. Just want it to find the matching program in your PATH. That is why I prefer using #!/usr/bin/env .... See the coreutils env documentation for additional discussion on the tradeoffs.

Emacs and Shebangs

My Emacs setup checks for a shebang, and ensures the file is executable on save. Check it out here. I suggest adding a hook to your text editor to ensure your scripts are executable on save. It saves a lot of headache.

I also wrote an experimental feature to auto-matically changes the syntax highlighting and major mode when the shebang changes in the Emacs buffer. The result is not fiddling about with distracting stuff like re-opening the file or using M-: (auto-mode) RET.

Shebangs are pretty cool.