Update the NAS to 24.05

Updated Wednesday, Jul 24, 2024

Turns out my NAS is vulnerable to the SSH vulnerability which allows anyone to log into your host with enough time, guaranteed. Dubbed regreSSHion (CVE-2024-6387), it affects a host of different OpenSSH version ranges. If one has OpenSSH 9.8p1 or later, one is totally fine.

Unfortunately, the NAS is still on NixOS 23.11. The NAS remains on NixOS, but all my other devices have been migrated off to Debian Testing. In this brief post, I wanted to describe the toil involved with upgrading this NixOS powered NAS. Some of it is my own PEBKAC fault. Some of it is through well meaning defaults of NixOS. Some of it is just not enough attention to detail. Or as I’ve recently come to discern, too complicated or temperamental for my liking. Maybe this will be entertaining or maybe it will be cringe. My hope is this post encompasses the process of upgrading a NixOS host, warts and all. Sorry for my snide remarks; I’m over the promises of unfettered complexity.

§Step 1: nix flake update

I haven’t updated the flakes containing both stable and unstable nixpkgs references in some time. So that’s my first step, before considering the upgrade to 24.04 in earnest. As a result, I need to git pull to ensure my repository is up to date. Whoops, a bit of PEBKAC broke git pull on my setup.

§ssh agent mess

Been using keychain for sometime. Here’s the relevant code from my dotfiles (link).

# shellcheck disable=SC1090
[[ -r ${HOME}/.keychain/${HOSTNAME}-sh ]] && . "${HOME}/.keychain/${HOSTNAME}-sh"
# shellcheck disable=SC1090
[[ -r ${HOME}/.keychain/${HOSTNAME}-sh-gpg ]] && . "${HOME}/.keychain/${HOSTNAME}-sh-gpg"

# shellcheck disable=SC2086
eval "$(keychain --eval -q --inherit any --agents ssh,gpg ${keys//\~/$HOME})"

Unfortunately, it seems to break ssh agent forwarding. so I can’t git pull on my NAS without disabling this module. That’s the first friction of the day; getting the dang ole’ thing to git pull correctly. After removing this file from my .bashrc.d/, ssh agent forwarding worked again. That’s a problem for another day.

Okay! Next contention.

§nixos-rebuild boot fails with no space left on device

I think this is a super frustrating issue that I’ve ran into on most hosts that I’ve deployed NixOS on. Because nix-build’s error reporting won’t tell you the directory it’s building in, it’s hard to know from the nix log ... incantation nor terminal output what’s the root cause of the error. It comes down the particular invocations in the derivation’s (package’s) build phase. Some will print out an absolute path, some won’t. This loosens the feedback loop… Now I have to monitor indirect metrics, such as disk usage or even strace something to see where it’s writing to.

Today, this costed me an hour of iterating with nixos-rebuild incantations in order to coax grafana to compile. For some reason, I assumed my ZFS Dataset named rpool — which contains /root, /home, and /var — might have been the culprit, so I invested an exorbitant amount of time reconfiguring Loki to clean up old log data, moved a handful of Syncthing folders into another ZFS dataset. As it turned out, that wasn’t the problem at all. A tighter feedback loop could have prevented time wastage here.

Another fix for this is to override the TMPDIR environment variable on the failing build. Or set it in your system.environment attribute set for permanent configuration. Next, you might need to reconfigure the size of /tmp if choosing TMPDIR=/tmp. It’s a tmpfs that resides in memory (and swap), its maximum size is configured to a percentage of your physical RAM. One can increase this size or migrate /tmp to a filesystem partition on disk. On low memory hosts it’s best off to eschew tmpfs for /tmp and instead set up a filesystem partition. Filesystem caching is pretty good nowadays so it’s not a serious slowdown as one might have tricked into believing.

Back to it: I tried sudo env TMPDIR=/tmp nixos-rebuild boot and lo and behold, the 4GiB tmpfs was insufficient to build Grafana. Shoot. Retried again for maybe the fifth time with sudo env TMPDIR=/var/tmp nixos-rebuild boot. Another thirty minutes later and the build finished successfully. /var/tmp typically lives either on your /var or / filesystem with plenty of space.

Now to reboot and ssh in.

§ssh then zfs load-key funkiness

I have it in my notes how to boot my NixOS powered NAS because it’s not exactly natural nor intuitive. First I reboot, then I wait to ssh in, then issue a zfs load-key command to unlock the rpool which contains NixOS, then kill the other zfs load-key command kicked off by the boot sequence, thereby resuming the boot sequence.

winston@silo ~ $ sudo reboot
[sudo] password for winston:
(7s) winston@silo ~ $ # oh yeah, reboot doesn't work :(
winston@silo ~ $ sudo systemctl reboot

Broadcast message from root@silo on pts/8 (Mon 2024-07-08 17:29:16 CDT):

The system will reboot now!

winston@silo ~ $ Connection to silo closed by remote host.
Connection to silo closed.
(5m50s) 255 winston@quasit ~ $ sleep 120;ssh -p2222 root@silo
~ # zfs load-key rpool
Enter passphrase for 'rpool':
~ # pkill zfs  # Next type Return followed by ~. to kill the ssh session!
~ # Connection to silo closed.
(56s) 255 winston@quasit ~ $ sleep 60; ssh -A silo

                           +&-
     Welcome to           _.-^-._    .--.
        silo.          .-'   _   '-. |__|
                      /     |_|     \|  |
                     /               \  |
                    /|     _____     |\ |
                     |    |==|==|    |  |
 |---|---|---|---|---|    |--|--|    |  |
 |---|---|---|---|---|    |==|==|    |  |
^jgs^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Last login: Mon Jul  8 17:23:33 2024 from 10.20.1.44
winston@silo ~ $ systemctl status

After pkill zfs is ran, the ssh connection will become unresponsive. Type return followed by ~. to tell OpenSSH to exit.

Systemd status is “still starting”. So I’ll come back another minute or two, see if it’s happy.

Okay, now it says systemd state is “running” and not “degraded”. We’re in business!

§Commit the changes

Now to commit the flake changes:

winston@silo ~/p/nixos-configs $ git commit -m 'flake: refresh'
.git/hooks/pre-commit: line 13: /nix/store/4zqn5lajki1z3a2avia658l1wacpi8v0-pre-commit-3.3.3/bin/pre-commit: No such file or directory
1 winston@silo ~/p/nixos-configs $ pre
precat      preunzip    prezip      prezip-bin
1 winston@silo ~/p/nixos-configs $ nix run nixpkgs#pre-commit install
pre-commit installed at .git/hooks/pre-commit
winston@silo ~/p/nixos-configs $ git commit -m 'flake: refresh'
Trim Trailing Whitespace.................................................Passed
Fix End of Files.........................................................Passed
Check Yaml...........................................(no files to check)Skipped
Check for added large files..............................................Passed
[master f0be2ea] flake: refresh
 2 files changed, 9 insertions(+), 8 deletions(-)

Yes, I know pre-commit should exist in my environment.systemPackages. I forgot why I didn’t add it. Maybe pre-commit is supposed to be in this folder’s shell.nix? Complicated. You can see why I’m shrinking away from nix-powered workflows! I just want the thing to do the thing.

Now, on to the upgrade to fix the OpenSSH vulnerability. A couple hours later…

§Re-read the release notes

NixOS has this stance about upgrades not dissimilar to classical stable release distros. Things will break. Things will break in fantastic ways. just don’t upgrade, and things won’t break. It’s a white lie of NixOS: that somehow you enjoy stability between upgrades. You aren’t guaranteed stability between upgrades. You must possesses the same due diligence and best judgment that is expected on any other distro. Be sure to read the release notes, there is no time-efficient alternative. A poorly aged adage of NixOS is “roll back if upgrade breaks stuff”. No need to ensure most upgrades go well on first try if that’s the community spirit of releasing upgrades! And don’t get me started on what happens if you need security updates. This blog post happens — you must upgrade everything or suture nixpkgs in order to recoup security updates.

Back to it. As promised, I read the release notes and my biggest take away is that there is an upgrade to Loki 3.0.0. I’m okay with Loki breaking, so I won’t bother with it until I experience breakage. It appears that my Nextcloud is still supported, so I won’t struggle through a Nextcloud upgrade (yet).

§Edit the flake inputs

All I did was modify inputs.nixpkgs.url to point to github:NixOS/nixpkgs/release-24.05. Then I ran nix flake update.

winston@silo ~/p/nixos-configs $ nix flake update
warning: Git tree '/home/winston/p/nixos-configs' is dirty
warning: updating lock file '/home/winston/p/nixos-configs/flake.lock':
• Updated input 'nixpkgs':
    'github:NixOS/nixpkgs/7144d6241f02d171d25fba3edeaf15e0f2592105' (2024-07-02)
  → 'github:NixOS/nixpkgs/de429c2a20520e0f81a1fd9d2677686a68cae739' (2024-07-08)
• Updated input 'unstable':
    'github:NixOS/nixpkgs/9f4128e00b0ae8ec65918efeba59db998750ead6' (2024-07-03)
  → 'github:NixOS/nixpkgs/655a58a72a6601292512670343087c2d75d859c1' (2024-07-08)
warning: Git tree '/home/winston/p/nixos-configs' is dirty

Next it’s time to try to build it, remembering to point TMPDIR to the spacious /var/tmp.

§Breaking pineentry change

winston@silo ~/p/nixos-configs $ sudo env TMPDIR=/var/tmp nixos-rebuild boot --flake ~/p/nixos-configs# --impure
[sudo] password for winston:
warning: Git tree '/home/winston/p/nixos-configs' is dirty
building the system configuration...
warning: Git tree '/home/winston/p/nixos-configs' is dirty
error:
       … while calling the 'head' builtin

         at /nix/store/xyqsyg4nw57nbva6r339hf5223d0ar4r-source/lib/attrsets.nix:1575:11:

         1574|         || pred here (elemAt values 1) (head values) then
         1575|           head values
             |           ^
         1576|         else

       … while evaluating the attribute 'value'

         at /nix/store/xyqsyg4nw57nbva6r339hf5223d0ar4r-source/lib/modules.nix:809:9:

          808|     in warnDeprecation opt //
          809|       { value = builtins.addErrorContext "while evaluating the option `${showOption loc}':" value;
             |         ^
          810|         inherit (res.defsFinal') highestPrio;

       (stack trace truncated; use '--show-trace' to show the full trace)

       error:
       Failed assertions:
       - The option definition `programs.gnupg.agent.pinentryFlavor' in `/nix/store/g4py9g5mqznfzgn7nrz1glg811k9xpll-source/common/base.nix' no longer has any effect; please remove it.
       Use programs.gnupg.agent.pinentryPackage instead
       (28s) 1 winston@silo ~/p/nixos-configs $

Balls. Legacy code shims be damned, this is NixOS! I think I need to edit some configuration, because the powers that be decided it must be done. But where? Cool errors bro. No problem, can search my source tree and with any hope, it’s within my own Nix configuration, and not in another input (e.g. the nixpkgs flake):

winston@silo ~/p/nixos-configs $ rg pinentryFlavor
common/base.nix
54:    pinentryFlavor = "gtk2";

Oh maybe I could have taken the /nix/store/not-the-path-that-im-working-out-of-prefix-source/common/base.nix path suffix and visited the relative path common/base.nix. Icky to manually fix paths in error messages, but that should suffice next time!

Okay, deleted the offending line. I don’t know why I’m using gtk2 the flavor anymore, so maybe it’s not important… let’s pray that it wasn’t important!

winston@silo ~/p/nixos-configs $ sudo env TMPDIR=/var/tmp nixos-rebuild boot --flake ~/p/nixos-configs# --impure
[sudo] password for winston:
warning: Git tree '/home/winston/p/nixos-configs' is dirty
building the system configuration...
warning: Git tree '/home/winston/p/nixos-configs' is dirty
error:
       … while calling the 'head' builtin

         at /nix/store/xyqsyg4nw57nbva6r339hf5223d0ar4r-source/lib/attrsets.nix:1575:11:

         1574|         || pred here (elemAt values 1) (head values) then
         1575|           head values
             |           ^
         1576|         else

       … while evaluating the attribute 'value'

         at /nix/store/xyqsyg4nw57nbva6r339hf5223d0ar4r-source/lib/modules.nix:809:9:

          808|     in warnDeprecation opt //
          809|       { value = builtins.addErrorContext "while evaluating the option `${showOption loc}':" value;
             |         ^
          810|         inherit (res.defsFinal') highestPrio;

       (stack trace truncated; use '--show-trace' to show the full trace)

       error: Package ‘nextcloud-27.1.11’ in /nix/store/xyqsyg4nw57nbva6r339hf5223d0ar4r-source/pkgs/servers/nextcloud/default.nix:35 is marked as insecure, refusing to evaluate.


       Known issues:
        - Nextcloud version 27.1.11 is EOL

       You can install it anyway by allowing this package, using the
       following methods:

       a) To temporarily allow all insecure packages, you can use an environment
          variable for a single invocation of the nix tools:

            $ export NIXPKGS_ALLOW_INSECURE=1

          Note: When using `nix shell`, `nix build`, `nix develop`, etc with a flake,
                then pass `--impure` in order to allow use of environment variables.

       b) for `nixos-rebuild` you can add ‘nextcloud-27.1.11’ to
          `nixpkgs.config.permittedInsecurePackages` in the configuration.nix,
          like so:

            {
              nixpkgs.config.permittedInsecurePackages = [
                "nextcloud-27.1.11"
              ];
            }

       c) For `nix-env`, `nix-build`, `nix-shell` or any other Nix command you can add
          ‘nextcloud-27.1.11’ to `permittedInsecurePackages` in
          ~/.config/nixpkgs/config.nix, like so:

            {
              permittedInsecurePackages = [
                "nextcloud-27.1.11"
              ];
            }
(28s) 1 winston@silo ~/p/nixos-configs $

Oh crud, apparently I’m running an insecure Nextcloud? Who knew! Looks like one can set an environment variable (NIXPKGS_ALLOW_INSECURE=1) to tell nixos-rebuild to calm down if only a smidgen.

§Loki changes

The release notes did mention Loki configuration changed and advised that users read the upstream Loki release notes too. It looks like I can’t push the Loki upgrade back and must handle it now:

winston@silo ~/p/nixos-configs $ sudo env NIXPKGS_ALLOW_INSECURE=1 TMPDIR=/var/tmp nixos-rebuild boot --flake ~/p/nixos-configs# --impure
warning: Git tree '/home/winston/p/nixos-configs' is dirty
building the system configuration...
warning: Git tree '/home/winston/p/nixos-configs' is dirty
error: builder for '/nix/store/v6g5b5c383a4h6i8bl210h91cp54qpz6-validate-loki-conf.drv' failed with exit code 1;
       last 5 log lines:
       > failed parsing config: /nix/store/3i7y6y5nwjqz8mhg1kakhmyd1cv7cy3i-loki-config.json: yaml: unmarshal errors:
       >   line 4: field max_look_back_period not found in type config.ChunkStoreConfig
       >   line 13: field shared_store not found in type compactor.Config
       >   line 31: field max_transfer_retries not found in type ingester.Config
       >   line 59: field shared_store not found in type boltdb.IndexCfg. Use `-config.expand-env=true` flag if you want to expand environment variables in your config file
       For full logs, run 'nix log /nix/store/v6g5b5c383a4h6i8bl210h91cp54qpz6-validate-loki-conf.drv'.
error: 1 dependencies of derivation '/nix/store/cxpdszil18wj6hb3az3jwqbyfy43wmj6-unit-loki.service.drv' failed to build
error: 1 dependencies of derivation '/nix/store/gyh8mr8sqk1gk20qldp062zym2mdy06c-system-units.drv' failed to build
error: 1 dependencies of derivation '/nix/store/qr88n0970ji783bkdwhk8ig7wazksa9v-etc.drv' failed to build
error: 1 dependencies of derivation '/nix/store/p36ja5jhi7wr2ayzxgf7ykkxhss599n2-nixos-system-silo-24.05.20240708.de429c2.drv' failed to build
(3m46s) 1 winston@silo ~/p/nixos-configs $

That list of Loki-specific configuration errors is extremely helpful. Well done. The nixos-rebuild process printed the problematic configuration fields out to the terminal. After trying to make heads or tails of the upgrade guide, I instead merely removed every offending line. Let’s see what works (shrug).

error: builder for '/nix/store/rkmm8wa8vz576bhwpz0wwmv8ck31653j-validate-loki-conf.drv' failed with exit code 1;
       last 1 log lines:
       > level=error ts=2024-07-08T23:14:46.302638345Z caller=main.go:66 msg="validating config" err="MULTIPLE CONFIG ERRORS FOUND, PLEASE READ CAREFULLY\nCONFIG ERROR: invalid compactor config: compactor.delete-request-store should be configured when retention is enabled\nCONFIG ERROR: schema v13 is required to store Structured Metadata and use native OTLP ingestion, your schema version is v11. Set `allow_structured_metadata: false` in the `limits_config` section or set the command line argument `-validation.allow-structured-metadata=false` and restart Loki. Then proceed to update to schema v13 or newer before re-enabling this config, search for 'Storage Schema' in the docs for the schema update procedure\nCONFIG ERROR: `tsdb` index type is required to store Structured Metadata and use native OTLP ingestion, your index type is `boltdb-shipper` (defined in the `store` parameter of the schema_config). Set `allow_structured_metadata: false` in the `limits_config` section or set the command line argument `-validation.allow-structured-metadata=false` and restart Loki. Then proceed to update the schema to use index type `tsdb` before re-enabling this config, search for 'Storage Schema' in the docs for the schema update procedure"
       For full logs, run 'nix log /nix/store/rkmm8wa8vz576bhwpz0wwmv8ck31653j-validate-loki-conf.drv'.

Here’s the error in reformatted:

MULTIPLE CONFIG ERRORS FOUND, PLEASE READ CAREFULLY
CONFIG ERROR: invalid compactor config: compactor.delete-request-store should
              be configured when retention is enabled
CONFIG ERROR: schema v13 is required to store Structured Metadata and use
              native OTLP ingestion, your schema version is v11. Set
              `allow_structured_metadata: false` in the `limits_config`
              section or set the command line argument
              `-validation.allow-structured-metadata=false` and restart Loki.
              Then proceed to update to schema v13 or newer before re-enabling
              this config, search for 'Storage Schema' in the docs for the
              schema update procedure
CONFIG ERROR: `tsdb` index type is required to store Structured Metadata and
              use native OTLP ingestion, your index type is `boltdb-shipper`
              (defined in the `store` parameter of the schema_config). Set
              `allow_structured_metadata: false` in the `limits_config` section
              or set the command line argument
              `-validation.allow-structured-metadata=false` and restart Loki.
              Then proceed to update the schema to use index type `tsdb`
              before re-enabling this config, search for 'Storage Schema'
              in the docs for the schema update procedure

Okay, I’ve added allowed_structured_metadata= false; now what will break next?

CONFIG ERROR: invalid compactor config: compactor.delete-request-store should
              be configured when retention is enabled

OK, I’ve added delete_request_store = "filesystem";, as per this GitHub issue.

§It built!

Hooray! It built!

winston@silo ~/p/nixos-configs $ sudo env NIXPKGS_ALLOW_INSECURE=1 TMPDIR=/var/tmp nixos-rebuild boot --flake ~/p/nixos-configs# --impure
warning: Git tree '/home/winston/p/nixos-configs' is dirty
building the system configuration...
warning: Git tree '/home/winston/p/nixos-configs' is dirty
trace: warning: The option `services.nextcloud.extraOptions' defined in `/nix/store/d99kz3ifvz1hqg8wni0bi2j08n3rdisr-source/hosts/silo' has been renamed to `services.nextcloud.settings'.
trace: warning: The option `services.nextcloud.config.defaultPhoneRegion' defined in `/nix/store/d99kz3ifvz1hqg8wni0bi2j08n3rdisr-source/hosts/silo' has been renamed to `services.nextcloud.settings.default_phone_region'.
trace: warning: A legacy Nextcloud install (from before NixOS 24.05) may be installed.

After nextcloud27 is installed successfully, you can safely upgrade
to 28. The latest version available is Nextcloud29.

Please note that Nextcloud doesn't support upgrades across multiple major versions
(i.e. an upgrade from 16 is possible to 17, but not 16 to 18).

The package can be upgraded by explicitly declaring the service-option
`services.nextcloud.package`.

updating GRUB 2 menu...
installing the GRUB 2 boot loader on /dev/disk/by-id/ata-CT120BX500SSD1_1943E3D1AC4B...
Installing for i386-pc platform.
Installation finished. No error reported.
updating GRUB 2 menu...
installing the GRUB 2 boot loader on /dev/disk/by-id/ata-CT120BX500SSD1_1943E3D1AC45...
Installing for i386-pc platform.
Installation finished. No error reported.
(51s) winston@silo ~/p/nixos-configs $

Now to snapshot everything real quick…

winston@silo ~/p/nixos-configs $ sudo zfs snapshot -r naspool@2024-08-07_pre-reboot
winston@silo ~/p/nixos-configs $ sudo zfs snapshot -r rpool@2024-08-07_pre-reboot

Now let’s reboot and hope for the best.

§degraded - libvirt-guests.service failure

systemctl status says the state is “degraded”. Let’s see what broke:

winston@silo ~ $ systemctl list-units --failed
  UNIT                   LOAD   ACTIVE SUB    DESCRIPTION
● libvirt-guests.service loaded failed failed libvirt guests suspend/resume service

Legend: LOAD   → Reflects whether the unit definition was properly loaded.
        ACTIVE → The high-level unit activation state, i.e. generalization of SUB.
        SUB    → The low-level unit activation state, values depend on unit type.

1 loaded units listed.
(34s) 3 winston@silo ~ $

OK let’s check the specific service:

winston@silo ~ $ systemctl status libvirt-guests.service
× libvirt-guests.service - libvirt guests suspend/resume service
     Loaded: loaded (/etc/systemd/system/libvirt-guests.service; enabled; preset: enabled)
    Drop-In: /nix/store/70x3p9hhrm202n3lfl1p79bv0h2c59zi-system-units/libvirt-guests.service.d
             └─overrides.conf
     Active: failed (Result: exit-code) since Mon 2024-07-08 18:38:25 CDT; 3min 11s ago
       Docs: man:libvirt-guests(8)
             https://libvirt.org/
    Process: 2340 ExecStart=/nix/store/6a5lmp5p08n9qsfd0l9aqc7jhigm82j9-libvirt-10.0.0/libexec/libvirt-guests.sh start (code=exited, status=1/FAILURE)
   Main PID: 2340 (code=exited, status=1/FAILURE)
         IP: 0B in, 0B out
        CPU: 184ms

Jul 08 18:38:17 silo systemd[1]: Starting libvirt guests suspend/resume service...
Jul 08 18:38:24 silo libvirt-guests.sh[3307]: Resuming guests on default URI...
Jul 08 18:38:25 silo libvirt-guests.sh[3313]: Resuming guest seedbox:
Jul 08 18:38:25 silo libvirt-guests.sh[3318]: error: Failed to start domain 'seedbox'
Jul 08 18:38:25 silo libvirt-guests.sh[3318]: error: operation failed: guest CPU doesn't match specification: extra features: vmx-ins-outs,vmx-true-ctls,vmx-store-lma,vmx-activity-hlt,vmx-vmwrite-vmexit-fields,vmx-apicv-xapic,vmx-ept,vmx-desc-exit,vmx-rdtscp-exit,vmx-apicv-x2apic,vmx-vpid,vmx-wbinvd-exit,vmx-unrestricted-guest,vmx-rdrand-exit,vmx-invpcid-exit,vmx-vmfunc,vmx-shadow-vmcs,vmx-invvpid,vmx-invvpid-single-addr,vmx-invvpid-all-context,vmx-ept-execonly,vmx-page-walk-4,vmx-ept-2mb,vmx-ept-1gb,vmx-invept,vmx-eptad,vmx-invept-single-context,vmx-invept-all-context,vmx-intr-exit,vmx-nmi-exit,vmx-vnmi,vmx-preemption-timer,vmx-vintr-pending,vmx-tsc-offset,vmx-hlt-exit,vmx-invlpg-exit,vmx-mwait-exit,vmx-rdpmc-exit,vmx-rdtsc-exit,vmx-cr3-load-noexit,vmx-cr3-store-noexit,vmx-cr8-load-exit,vmx-cr8-store-exit,vmx-flexpriority,vmx-vnmi-pending,vmx-movdr-exit,vmx-io-exit,vmx-io-bitmap,vmx-mtf,vmx-msr-bitmap,vmx-monitor-exit,vmx-pause-exit,vmx-secondary-ctls,vmx-exit-nosave-debugctl,vmx-exit-load-perf-global-ctrl,vmx-exit-ack-intr,vmx-exit-save-pat,vmx-exit-load-pat,vmx-exit-save-efer,vmx-exit-load-efer,vmx-exit-save-preemption-timer,vmx-entry-noload-debugctl,vmx-entry-ia32e-mode,vmx-entry-load-perf-global-ctrl,vmx-entry-load-pat,vmx-entry-load-efer,vmx-eptp-switching, missing features: vmx-apicv-register,vmx-apicv-vid,vmx-posted-intr
Jul 08 18:38:25 silo systemd[1]: libvirt-guests.service: Main process exited, code=exited, status=1/FAILURE
Jul 08 18:38:25 silo systemd[1]: libvirt-guests.service: Failed with result 'exit-code'.
Jul 08 18:38:25 silo systemd[1]: Failed to start libvirt guests suspend/resume service.

Oh snap, looks like my virtual machine (VM) for BitTorrent is broken. More specifically, libvirtd failed to resume the VM from saved state (it’s like folding your laptop shut, but for virtual machines). By the way, everything on archive.org is available via BitTorrent. Most Linux distros provide .torrent files too! Hosting torrents for these projects is one low-effort way to help out the community.

Here’s the error reformatted:

error: Failed to start domain 'seedbox'
error: operation failed: guest CPU doesn't match specification: extra features:
       vmx-ins-outs,vmx-true-ctls,vmx-store-lma,vmx-activity-hlt,
       vmx-vmwrite-vmexit-fields,vmx-apicv-xapic,vmx-ept,vmx-desc-exit,
       vmx-rdtscp-exit,vmx-apicv-x2apic,vmx-vpid,vmx-wbinvd-exit,
       vmx-unrestricted-guest,vmx-rdrand-exit,vmx-invpcid-exit,
       vmx-vmfunc,vmx-shadow-vmcs,vmx-invvpid,vmx-invvpid-single-addr,
       vmx-invvpid-all-context,vmx-ept-execonly,vmx-page-walk-4,vmx-ept-2mb,
       vmx-ept-1gb,vmx-invept,vmx-eptad,vmx-invept-single-context,
       vmx-invept-all-context,vmx-intr-exit,vmx-nmi-exit,vmx-vnmi,
       vmx-preemption-timer,vmx-vintr-pending,vmx-tsc-offset,vmx-hlt-exit,
       vmx-invlpg-exit,vmx-mwait-exit,vmx-rdpmc-exit,vmx-rdtsc-exit,
       vmx-cr3-load-noexit,vmx-cr3-store-noexit,vmx-cr8-load-exit,
       vmx-cr8-store-exit,vmx-flexpriority,vmx-vnmi-pending,vmx-movdr-exit,
       vmx-io-exit,vmx-io-bitmap,vmx-mtf,vmx-msr-bitmap,vmx-monitor-exit,
       vmx-pause-exit,vmx-secondary-ctls,vmx-exit-nosave-debugctl,
       vmx-exit-load-perf-global-ctrl,vmx-exit-ack-intr,vmx-exit-save-pat,
       vmx-exit-load-pat,vmx-exit-save-efer,vmx-exit-load-efer,
       vmx-exit-save-preemption-timer,vmx-entry-noload-debugctl,
       vmx-entry-ia32e-mode,vmx-entry-load-perf-global-ctrl,
       vmx-entry-load-pat,vmx-entry-load-efer,vmx-eptp-switching,
       missing features: vmx-apicv-register,vmx-apicv-vid,vmx-posted-intr

Whoops, after a quick google, I encountered this regression. If I run virsh start seedbox, I get a similar error related to vmx. The solution is to either upgrade to 10.2.0 or disable the vmx feature in the guest. vmx is used for nested virtualization that which my guest does not utilize. Sidenote, vmx support is detectable via CPUID which was the topic of my last article.

As I don’t use nested virtualization, I opted to disable it. I invoked virsh edit seedbox to spawn a text editor with the libvirtd domain’s (guest’s) XML therein, then edited <feature policy='require' name='vmx'/> to <feature policy='disable' name='vmx'/>. That <feature> element can be found nested within the <cpu> element.

I couldn’t help notice had I built this host with Ubuntu, I wouldn’t had experienced this regression, because the updated QEMU has been shipped already. In fact, the maintainers marked this bug as Critical.

§OK am I secure now?

winston@silo ~/p/nixos-configs $ echo | nc localhost 22
SSH-2.0-OpenSSH_9.7
Invalid SSH identification string.

Uhhhhhhhhhhhhhhhhhhhhhhhhhh… what?

OK after reading the discourse, it sounds like they didn’t modify the version string to indicate it has been fixed. If I run nix run nixpkgs#vulnix -- -R $(readlink -f $(which ssh)), I get Found no advisories. Excellent!. This feels wrong and bad. Not a resounding “you’re safe!” and a spiffy new version string to back up this claim. No, all I have instead is a random tool claiming it’s all safe and secure. Trust me bro.

And don’t forget the icing on top of the cake: the nixpkgs team deviated from standard operating procedures to backport the security fix to 23.11, despite 23.11’s EOL (End of Life) status, and despite their insistence on everyone upgrade off EOL releases. Had I read one particular forum post, I would have known that it wasn’t necessary to upgrade!

I’m not out of the woods yet, that pesky Nextcloud upgrade is pending. I’ll be sure to share if I broke anything.

The fact remains, had I read the discourse very carefully, I wouldn’t had invested 6 hours today babysitting a NixOS upgrade, since 23.11 received the same OpenSSH vulnerability fix abject to the team’s standard operating procedures. I’ll leave it at that. I don’t want to be unduly negative, but that’s an opportunity cost on everything else one could do with every day’s sacred time.

The upgrade workflow was cool the first time, but at least one time a year, per machine? That sounds like a lot of toilsome drudgery. Plus one must face the inevitable bitrot caused by the behemoth codebase that nixpkgs has ballooned into, dragged down by a contribution system that permits thousands of tickets to go unsatisfied and hundreds of unmerged PRs to stagnate within its churn.

NixOS has its uses, and I believe my uses are too pedestrian for NixOS.