What’s worse than a fire on a boat? A fire aboard an air balloon. Rip my fly.io app.

Affected Apps:

  • sillypaste-db

A server hosting some of your apps has suffered irreparable hardware damage. Please migrate your Fly Machines to other hosts and restore volumes from any backups.

All good things come to an end, including this pastebin project. If I find myself using it again, I may spin up a fresh database if that opportunity presents itself.

I wouldn’t recommend fly.io at this time. Having been erroneously billed (then quickly refunded) a couple times, fought with their V1 stack and data corruption issues, and it goes on. At one time I ended up with the PostgresSQL database container crashing with Out of Memory… despite issuing countless scale commands. Eventually I paid for the $20/mo email support for the support staff to tell me flatly - you must migrate and yes v1 is still considered robust for some workloads. I’m pretty sure the reliability claim is a white lie, but indeed, I had to upgrade. Turned out to be tricky due to inconsistent documentation on V1 vs V2 features. Had to ask another question.

It should be noted that support was quick to reply to my email because I paid for the $20 email support plan; support on the community forums is best effort and will not lead to a fruitful support conversation.

What went well with Fly.io?

Now that fly.io’s critical negative review has been written, let’s review some of Fly.io’s wins.

  • flyctl command line tool compares favorable against heroku. Simpler operation, less features to confound over.
  • Its web design is modern and nice… but like many UIs designed for touch input, it suffers from low density… which pushed me to use the CLI wherever possible.
  • Support forum uses Discourse, a wonderful no-nonsense support forum experience.
  • When it works, it works really well.
  • It was a cheap experiment. You do get what you pay for (free hosting = it can burn down at any time and that’s ok).

Alternatives for hosting Sillypaste

I have no first-hand endorsements, but next I’d assess AWS Lambda, Digital Ocean Apps, or a simple VPS for hosting this modest Django webapp. A service that gets billing right, maintains consistent documentation, and has a working reliable PaaS offering is ideal.

For now, Sillypaste shall remain offline but the code is always available on GitHub. All I need to do to remove the DNS paste.winny.tech entry is delete it from my terraform project then run terraform apply.

Figure 1: John from USA - CC-BY-2.0

Figure 1: John from USA - CC-BY-2.0

Watch out, things break, stuff catches fire. Let’s talk about backups.

Last post, I stated that I’m going to switch focus away from NixOS commentary. This is still the plan. Today, I am still committed to NixOS thanks to technical debt created - migrations aren’t for free. Until then, enjoy my NixOS posting :).

Last fall, I wanted to reformat my laptop’s NixOS deployment from BTRFS (encased within LVM2 itself encased in LUKS) to a ZFS partition plus another swap partition. My Nix install is comprised of a few artifacts:

  1. My git repository with the flake.nix and flake.lock files
  2. The workstation’s /secrets folder, sensitive data for service accounts.
  3. The workstation’s /home folder

Both /secrets and /home are backed up via borgmatic (using borg) on a nightly basis via a crufty old nixos module that I wrote (example of usage). Both folders were also snapshotted by BTRBK every 15 minutes (via this nixos configuration). This frequent snapshotting policy will continue on the ZFS reinstall powered by zfs-autosnapshot.

The first test was to verify the integrity of the backed up artifacts. I was able to execute a full restore from backup from within a virtual machine. This included adapting my laptop’s flake configuration to the VM, rebuilding, then executing the borg extract commands.

Fun fact: borg mount and rsync is several times slower than running borg extract (using BorgBase). Keep that in mind when executing restores - if you need a full restore or a restore of a subdirectory, consider borg extract. If you need to pick and choose many files, consider borg mount.

After the successful test restore, it was time to execute a final backup. On my setup that’s as simple as systemctl start backup. Then boot a NixOS installer. Invoking parted /dev/nvme0n1, I came up with the following partition layout:

Model: INTEL SSDPEKNU010TZ (nvme)
Disk /dev/nvme0n1: 1024GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Disk Flags:

Number  Start   End     Size    File system  Name  Flags
 1      1049kB  1000MB  999MB   fat32        boot  boot, esp
 2      1000MB  1007GB  1006GB               zfs
 3      1007GB  1023GB  16.2GB               swap

The swap partition is used in conjunction with NixOS’s swapDevices.*.randomEncryption.enable setting. This swap partition is encrypted using LUKS. This encrypted device mapper device is used for swap.

Then I followed my standard install instructions here on this blog.

Recovery strategy

As part of this restore procedure, I tested my restore strategy. It turned out the thumb drive and QR code with the full disk encryption (FDE) key for the thumb drive were compromised. They simply didn’t work - the QR code was of a different key.

Had I needed to recover my setup in a data loss scenario, it is likely I would had lost data due to not having access to recovery material. I was at risk of data loss. Ooop!

Figure 2: Alex E. Proimos - CC-BY-2.0

Figure 2: Alex E. Proimos - CC-BY-2.0

Next I created new recovery material. It consisted of two components: A passphrase and an encrypted thumb drive. They live together; the passphrase is more of a “are you sure you want to open this?” than a security measure. The encrypted thumb drive contains my PGP private keys (encrypted with each key’s own passphrase) and password database encrypted against my “private use only” key.

In order to restore from this media, first open the LUKS container via cryptsetup luksOpen /dev/disk/by-id/usb-...-part0 usbCrypt then mount it via mount /dev/mapper/usbCrypt /mnt/usb. I can load the GPG keys into my gpg then run gpg --decrypt --output - /mnt/usb/password-store/backups/stargate/passphrase to get the backup borg storage passphrase. I can then set up borg to access my backup via accessing my backup service’s dashboard.

Finally I was able to run borg extract ... to kick off the restore on the laptop.

From start to finish the redeploy and restore took 3 hours for the data restore and another hour due to various tasks of how this procedure works. It’s not super automatic, but hey, it’s tested and it works!

Well I guess the restore worked!

Figure 3: Data loss is possible with a sledge hammer and no backups

Figure 3: Data loss is possible with a sledge hammer and no backups

Test your backups. Until you do, they are but a speculative investment; you’re not sure if they work. In theory they say they should, however, who really knows. Nobody really knows. Go test your backups. Haven’t done it yet, well, then, buy this sledge hammer and apply it to your storage devices, because your data is worthless without tested backups - it could disappear at any time. Theft, fire, PEBKAC, Software Bug (like Steam’s infamous rm -rf script bug)… anything is possible.

Want a T-shirt?

I’m selling T-shirts with that sledge hammer fellow on the back and “COMPUTERS WERE A MISTAKE” on the front. Of course it’s supposed to be cheeky and not too serious - we must laugh at technology before it destroys our human identity. And embrace the good parts. Computers are fun, if you let ’em. Have fun with ’em, but don’t let ’em control every aspect of your being.

Figure 1: The laptop that was having a bad day with NixOS 23.11

Figure 1: The laptop that was having a bad day with NixOS 23.11

More upgrade gotchas. Shucks. If everything goes well, this will be my last NixOS post. Read on to understand my frustration just a little bit more.

My main laptop is a Lenovo Ideapad Flex 5 — simple and cheap device. The keyboard stopped working in the early boot after upgrading to 23.11. The impact: I need to a USB keyboard around to unlock the device from a cold boot. Unfortunate for me, I forgot to bring one when I took the train to Chicago the other day. Instead, I had to spend a bunch of time troubleshooting.

NixOS not discovering keyboard driver?

Figure 2: User typing on keyboard to unlock disk, but no effect

Figure 2: User typing on keyboard to unlock disk, but no effect

Whenever I was typing at the console to unlock the storage’s full disk encryption, the cursor didn’t seem to blink. This suggested perhaps the keyboard wasn’t being recognized.

I discovered the cause after booting a GRML live USB that was in my bag for a different project. I spent time looking at its lsmod output, and checking various details in /dev/input/input*/device/ including the description file. I found one description file that had the contents of i8042 KBD port. This description seemingly correlated to a i8042 module loaded in the Kernel.

Then I looked at the lsmod output again, grepping for lines containing i8042. I noted there is a ideapad_laptop module which depends on i8042. Bingo, this seemed like the module to load at boot. Might be fine to load i8042 directly, however ideapad_laptop seems like the driver for this laptop’s specific hardware, so I figured why not load it all.

Got lucky, turned out an older NixOS generation seemingly worked, so I didn’t have to figure out GRML networking, somehow get ZFS set up on it, and unlock the disks in GRML. Instead I just booted the old NixOS generation, fired up emacs, ignored all the errors caused by Emacs trying to load compiled elisp objects between major releases or something of that nature. nix-mode didn’t work either but that’s fine, fundamental-mode works too I guess. I added "ideapad_laptop" to boot.kernelModules in nix flake (commit), then ran a nixos-rebuild boot --flake .#.

Rebooted and viola! I could boot into 23.11 and use my keyboard!

Lessons learned

To get into the broken NixOS install faster, I could carry on a live USB on my person. This way I can access my NixOS installation without further shenanigans. It can minimize friction caused through booting an older generation that messes with user data in frictionful ways (such as Emacs failing to load nix-mode in the older generation). GRML doesn’t ship with ZFS so it doesn’t quite meet this need.

nixos-generate-config is not infallible. It doesn’t include this i8042 driver for my laptop’s configuration and the initramfs doesn’t seem to load the driver without the above fix.

It appears I need to run NixOS unstable and test it on my machines in order to validate my installs prior to release. This is a lot of work and I’m not up for it. It doesn’t bring me value or joy. As of Jan 15, 2024, there are over 7k open issues, 5k open pull requests. That’s a lot of work blocked or waiting to be done. How do I know that my particular breakage will be addressed in any reasonable timeline? How do I know that my pull requests will be commented on in a timely manner? No promises, and therefore I am not demoralized from implementing further release testing on my end anyways.

Next steps for my computing needs

Figure 3: My unpaid part-time job trying to write blog posts

Figure 3: My unpaid part-time job trying to write blog posts

It’s becoming more and more apparent I don’t want to play computer janitor every time I go to write an email or load a webpage. To this end I don’t think I’ll be using NixOS in the future. Need something with a committed testing culture and mechanisms in place to hear community feedback and take appropriate actions to correct direction / maintain focus on a singular vision. I could write a book about all my grievances… it’s besides the point… it’s bad energy and it’s hubris to think I could even change anything about the community, time to move on.

Debian Testing combined with Ansible and Nix might be sufficient. Or Gentoo combined with Ansible and Nix could also work. Gentoo now offers pre-built binpkgs so the installation experience might be about as fast as deploying a NixOS host from scratch. Bonus, while Nix offers some built-input parameters, there isn’t standardization on what those are like Gentoo USE flags, so Gentoo wins out in package customization in a consistent way.

Upgrading is a time commitment, NixOS can’t fix this. I dread the thought of upgrading to the next release, so my migration timeline will be by the time 23.11 is EOL (in about 5 months — 30 Jun 2024… yet another frustration with NixOS).

What about you?

Does NixOS work for you? If you were to calculate the time spent per week learning how to use Nix/NixOS, fixing trivial issues that one takes for granted as fixed on most other distros, trying not to burn yourself out in nixpkgs, and asking for help but not getting any assistance… was it worth it?