Turns out my NAS is vulnerable to the SSH vulnerability which allows anyone to log into your host with enough time, guaranteed. Dubbed regreSSHion (CVE-2024-6387), it affects a host of different OpenSSH version ranges. If one has OpenSSH 9.8p1 or later, one is totally fine.
Unfortunately, the NAS is still on NixOS 23.11. The NAS remains on NixOS, but all my other devices have been migrated off to Debian Testing. In this brief post, I wanted to describe the toil involved with upgrading this NixOS powered NAS. Some of it is my own PEBKAC fault. Some of it is through well meaning defaults of NixOS. Some of it is just not enough attention to detail. Or as I’ve recently come to discern, too complicated or temperamental for my liking. Maybe this will be entertaining or maybe it will be cringe. My hope is this post encompasses the process of upgrading a NixOS host, warts and all. Sorry for my snide remarks; I’m over the promises of unfettered complexity.
§Step 1: nix flake update
I haven’t updated the flakes containing both stable and unstable nixpkgs
references in some time. So that’s my first step, before considering the
upgrade to 24.04 in earnest. As a result, I need to git pull
to ensure my
repository is up to date. Whoops, a bit of PEBKAC broke git pull
on my
setup.
§ssh agent mess
Been using keychain for sometime. Here’s the relevant code from my dotfiles (link).
# shellcheck disable=SC1090
[[ -r ${HOME}/.keychain/${HOSTNAME}-sh ]] && . "${HOME}/.keychain/${HOSTNAME}-sh"
# shellcheck disable=SC1090
[[ -r ${HOME}/.keychain/${HOSTNAME}-sh-gpg ]] && . "${HOME}/.keychain/${HOSTNAME}-sh-gpg"
# shellcheck disable=SC2086
eval "$(keychain --eval -q --inherit any --agents ssh,gpg ${keys//\~/$HOME})"
Unfortunately, it seems to break ssh agent forwarding. so I can’t git pull
on my NAS without disabling this module. That’s the first friction of the day;
getting the dang ole’ thing to git pull
correctly. After removing this file
from my .bashrc.d/
, ssh agent forwarding worked again. That’s a problem for
another day.
Okay! Next contention.
§nixos-rebuild boot
fails with no space left on device
I think this is a super frustrating issue that I’ve ran into on most hosts that
I’ve deployed NixOS on. Because nix-build
’s error reporting won’t tell you
the directory it’s building in, it’s hard to know from the nix log ...
incantation nor terminal output what’s the root cause of the error. It comes
down the particular invocations in the derivation’s (package’s) build phase.
Some will print out an absolute path, some won’t. This loosens the feedback
loop… Now I have to monitor indirect metrics, such as disk usage or even
strace something to see where it’s writing to.
Today, this costed me an hour of iterating with nixos-rebuild
incantations in
order to coax grafana to compile. For some reason, I assumed my ZFS Dataset
named rpool
— which contains /root
, /home
, and /var
— might have
been the culprit, so I invested an exorbitant amount of time reconfiguring Loki
to clean up old log data, moved a handful of Syncthing folders into another ZFS
dataset. As it turned out, that wasn’t the problem at all. A tighter feedback
loop could have prevented time wastage here.
Another fix for this is to override the TMPDIR
environment variable on the
failing build. Or set it in your system.environment
attribute set for
permanent configuration. Next, you might need to reconfigure the size of
/tmp
if choosing TMPDIR=/tmp
. It’s a tmpfs that resides in memory (and
swap), its maximum size is configured to a percentage of your physical RAM.
One can increase this size or migrate /tmp
to a filesystem partition on disk.
On low memory hosts it’s best off to eschew tmpfs for /tmp
and instead set up
a filesystem partition. Filesystem caching is pretty good nowadays so it’s not
a serious slowdown as one might have tricked into believing.
Back to it: I tried sudo env TMPDIR=/tmp nixos-rebuild boot
and lo and
behold, the 4GiB tmpfs was insufficient to build Grafana. Shoot. Retried
again for maybe the fifth time with sudo env TMPDIR=/var/tmp nixos-rebuild boot
. Another thirty minutes later and the build finished successfully.
/var/tmp
typically lives either on your /var
or /
filesystem with plenty
of space.
Now to reboot and ssh in.
§ssh then zfs load-key funkiness
I have it in my notes how to boot my NixOS powered NAS because it’s not exactly
natural nor intuitive. First I reboot, then I wait to ssh in, then issue a
zfs load-key
command to unlock the rpool
which contains NixOS, then kill
the other zfs load-key
command kicked off by the boot sequence, thereby
resuming the boot sequence.
winston@silo ~ $ sudo reboot
[sudo] password for winston:
(7s) winston@silo ~ $ # oh yeah, reboot doesn't work :(
winston@silo ~ $ sudo systemctl reboot
Broadcast message from root@silo on pts/8 (Mon 2024-07-08 17:29:16 CDT):
The system will reboot now!
winston@silo ~ $ Connection to silo closed by remote host.
Connection to silo closed.
(5m50s) 255 winston@quasit ~ $ sleep 120;ssh -p2222 root@silo
~ # zfs load-key rpool
Enter passphrase for 'rpool':
~ # pkill zfs # Next type Return followed by ~. to kill the ssh session!
~ # Connection to silo closed.
(56s) 255 winston@quasit ~ $ sleep 60; ssh -A silo
+&-
Welcome to _.-^-._ .--.
silo. .-' _ '-. |__|
/ |_| \| |
/ \ |
/| _____ |\ |
| |==|==| | |
|---|---|---|---|---| |--|--| | |
|---|---|---|---|---| |==|==| | |
^jgs^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Last login: Mon Jul 8 17:23:33 2024 from 10.20.1.44
winston@silo ~ $ systemctl status
After pkill zfs
is ran, the ssh connection will become unresponsive. Type
return
followed by ~.
to tell OpenSSH to exit.
Systemd status is “still starting”. So I’ll come back another minute or two, see if it’s happy.
Okay, now it says systemd state is “running” and not “degraded”. We’re in business!
§Commit the changes
Now to commit the flake changes:
winston@silo ~/p/nixos-configs $ git commit -m 'flake: refresh'
.git/hooks/pre-commit: line 13: /nix/store/4zqn5lajki1z3a2avia658l1wacpi8v0-pre-commit-3.3.3/bin/pre-commit: No such file or directory
1 winston@silo ~/p/nixos-configs $ pre
precat preunzip prezip prezip-bin
1 winston@silo ~/p/nixos-configs $ nix run nixpkgs#pre-commit install
pre-commit installed at .git/hooks/pre-commit
winston@silo ~/p/nixos-configs $ git commit -m 'flake: refresh'
Trim Trailing Whitespace.................................................Passed
Fix End of Files.........................................................Passed
Check Yaml...........................................(no files to check)Skipped
Check for added large files..............................................Passed
[master f0be2ea] flake: refresh
2 files changed, 9 insertions(+), 8 deletions(-)
Yes, I know pre-commit
should exist in my environment.systemPackages. I
forgot why I didn’t add it. Maybe pre-commit
is supposed to be in this
folder’s shell.nix
? Complicated. You can see why I’m shrinking away from
nix-powered workflows! I just want the thing to do the thing.
Now, on to the upgrade to fix the OpenSSH vulnerability. A couple hours later…
§Re-read the release notes
NixOS has this stance about upgrades not dissimilar to classical stable release distros. Things will break. Things will break in fantastic ways. just don’t upgrade, and things won’t break. It’s a white lie of NixOS: that somehow you enjoy stability between upgrades. You aren’t guaranteed stability between upgrades. You must possesses the same due diligence and best judgment that is expected on any other distro. Be sure to read the release notes, there is no time-efficient alternative. A poorly aged adage of NixOS is “roll back if upgrade breaks stuff”. No need to ensure most upgrades go well on first try if that’s the community spirit of releasing upgrades! And don’t get me started on what happens if you need security updates. This blog post happens — you must upgrade everything or suture nixpkgs in order to recoup security updates.
Back to it. As promised, I read the release notes and my biggest take away is that there is an upgrade to Loki 3.0.0. I’m okay with Loki breaking, so I won’t bother with it until I experience breakage. It appears that my Nextcloud is still supported, so I won’t struggle through a Nextcloud upgrade (yet).
§Edit the flake inputs
All I did was modify inputs.nixpkgs.url
to point to
github:NixOS/nixpkgs/release-24.05
. Then I ran nix flake update
.
winston@silo ~/p/nixos-configs $ nix flake update
warning: Git tree '/home/winston/p/nixos-configs' is dirty
warning: updating lock file '/home/winston/p/nixos-configs/flake.lock':
• Updated input 'nixpkgs':
'github:NixOS/nixpkgs/7144d6241f02d171d25fba3edeaf15e0f2592105' (2024-07-02)
→ 'github:NixOS/nixpkgs/de429c2a20520e0f81a1fd9d2677686a68cae739' (2024-07-08)
• Updated input 'unstable':
'github:NixOS/nixpkgs/9f4128e00b0ae8ec65918efeba59db998750ead6' (2024-07-03)
→ 'github:NixOS/nixpkgs/655a58a72a6601292512670343087c2d75d859c1' (2024-07-08)
warning: Git tree '/home/winston/p/nixos-configs' is dirty
Next it’s time to try to build it, remembering to point TMPDIR
to the
spacious /var/tmp
.
§Breaking pineentry change
winston@silo ~/p/nixos-configs $ sudo env TMPDIR=/var/tmp nixos-rebuild boot --flake ~/p/nixos-configs# --impure
[sudo] password for winston:
warning: Git tree '/home/winston/p/nixos-configs' is dirty
building the system configuration...
warning: Git tree '/home/winston/p/nixos-configs' is dirty
error:
… while calling the 'head' builtin
at /nix/store/xyqsyg4nw57nbva6r339hf5223d0ar4r-source/lib/attrsets.nix:1575:11:
1574| || pred here (elemAt values 1) (head values) then
1575| head values
| ^
1576| else
… while evaluating the attribute 'value'
at /nix/store/xyqsyg4nw57nbva6r339hf5223d0ar4r-source/lib/modules.nix:809:9:
808| in warnDeprecation opt //
809| { value = builtins.addErrorContext "while evaluating the option `${showOption loc}':" value;
| ^
810| inherit (res.defsFinal') highestPrio;
(stack trace truncated; use '--show-trace' to show the full trace)
error:
Failed assertions:
- The option definition `programs.gnupg.agent.pinentryFlavor' in `/nix/store/g4py9g5mqznfzgn7nrz1glg811k9xpll-source/common/base.nix' no longer has any effect; please remove it.
Use programs.gnupg.agent.pinentryPackage instead
(28s) 1 winston@silo ~/p/nixos-configs $
Balls. Legacy code shims be damned, this is NixOS! I think I need to edit some configuration, because the powers that be decided it must be done. But where? Cool errors bro. No problem, can search my source tree and with any hope, it’s within my own Nix configuration, and not in another input (e.g. the nixpkgs flake):
winston@silo ~/p/nixos-configs $ rg pinentryFlavor
common/base.nix
54: pinentryFlavor = "gtk2";
Oh maybe I could have taken the
/nix/store/not-the-path-that-im-working-out-of-prefix-source/common/base.nix
path suffix and visited the relative path common/base.nix
. Icky to manually
fix paths in error messages, but that should suffice next time!
Okay, deleted the offending line. I don’t know why I’m using gtk2 the flavor anymore, so maybe it’s not important… let’s pray that it wasn’t important!
winston@silo ~/p/nixos-configs $ sudo env TMPDIR=/var/tmp nixos-rebuild boot --flake ~/p/nixos-configs# --impure
[sudo] password for winston:
warning: Git tree '/home/winston/p/nixos-configs' is dirty
building the system configuration...
warning: Git tree '/home/winston/p/nixos-configs' is dirty
error:
… while calling the 'head' builtin
at /nix/store/xyqsyg4nw57nbva6r339hf5223d0ar4r-source/lib/attrsets.nix:1575:11:
1574| || pred here (elemAt values 1) (head values) then
1575| head values
| ^
1576| else
… while evaluating the attribute 'value'
at /nix/store/xyqsyg4nw57nbva6r339hf5223d0ar4r-source/lib/modules.nix:809:9:
808| in warnDeprecation opt //
809| { value = builtins.addErrorContext "while evaluating the option `${showOption loc}':" value;
| ^
810| inherit (res.defsFinal') highestPrio;
(stack trace truncated; use '--show-trace' to show the full trace)
error: Package ‘nextcloud-27.1.11’ in /nix/store/xyqsyg4nw57nbva6r339hf5223d0ar4r-source/pkgs/servers/nextcloud/default.nix:35 is marked as insecure, refusing to evaluate.
Known issues:
- Nextcloud version 27.1.11 is EOL
You can install it anyway by allowing this package, using the
following methods:
a) To temporarily allow all insecure packages, you can use an environment
variable for a single invocation of the nix tools:
$ export NIXPKGS_ALLOW_INSECURE=1
Note: When using `nix shell`, `nix build`, `nix develop`, etc with a flake,
then pass `--impure` in order to allow use of environment variables.
b) for `nixos-rebuild` you can add ‘nextcloud-27.1.11’ to
`nixpkgs.config.permittedInsecurePackages` in the configuration.nix,
like so:
{
nixpkgs.config.permittedInsecurePackages = [
"nextcloud-27.1.11"
];
}
c) For `nix-env`, `nix-build`, `nix-shell` or any other Nix command you can add
‘nextcloud-27.1.11’ to `permittedInsecurePackages` in
~/.config/nixpkgs/config.nix, like so:
{
permittedInsecurePackages = [
"nextcloud-27.1.11"
];
}
(28s) 1 winston@silo ~/p/nixos-configs $
Oh crud, apparently I’m running an insecure Nextcloud? Who knew! Looks like
one can set an environment variable (NIXPKGS_ALLOW_INSECURE=1
) to tell
nixos-rebuild to calm down if only a smidgen.
§Loki changes
The release notes did mention Loki configuration changed and advised that users read the upstream Loki release notes too. It looks like I can’t push the Loki upgrade back and must handle it now:
winston@silo ~/p/nixos-configs $ sudo env NIXPKGS_ALLOW_INSECURE=1 TMPDIR=/var/tmp nixos-rebuild boot --flake ~/p/nixos-configs# --impure
warning: Git tree '/home/winston/p/nixos-configs' is dirty
building the system configuration...
warning: Git tree '/home/winston/p/nixos-configs' is dirty
error: builder for '/nix/store/v6g5b5c383a4h6i8bl210h91cp54qpz6-validate-loki-conf.drv' failed with exit code 1;
last 5 log lines:
> failed parsing config: /nix/store/3i7y6y5nwjqz8mhg1kakhmyd1cv7cy3i-loki-config.json: yaml: unmarshal errors:
> line 4: field max_look_back_period not found in type config.ChunkStoreConfig
> line 13: field shared_store not found in type compactor.Config
> line 31: field max_transfer_retries not found in type ingester.Config
> line 59: field shared_store not found in type boltdb.IndexCfg. Use `-config.expand-env=true` flag if you want to expand environment variables in your config file
For full logs, run 'nix log /nix/store/v6g5b5c383a4h6i8bl210h91cp54qpz6-validate-loki-conf.drv'.
error: 1 dependencies of derivation '/nix/store/cxpdszil18wj6hb3az3jwqbyfy43wmj6-unit-loki.service.drv' failed to build
error: 1 dependencies of derivation '/nix/store/gyh8mr8sqk1gk20qldp062zym2mdy06c-system-units.drv' failed to build
error: 1 dependencies of derivation '/nix/store/qr88n0970ji783bkdwhk8ig7wazksa9v-etc.drv' failed to build
error: 1 dependencies of derivation '/nix/store/p36ja5jhi7wr2ayzxgf7ykkxhss599n2-nixos-system-silo-24.05.20240708.de429c2.drv' failed to build
(3m46s) 1 winston@silo ~/p/nixos-configs $
That list of Loki-specific configuration errors is extremely helpful. Well
done. The nixos-rebuild
process printed the problematic configuration fields
out to the terminal. After trying to make heads or tails of the upgrade guide,
I instead merely removed every offending line. Let’s see what works (shrug).
error: builder for '/nix/store/rkmm8wa8vz576bhwpz0wwmv8ck31653j-validate-loki-conf.drv' failed with exit code 1;
last 1 log lines:
> level=error ts=2024-07-08T23:14:46.302638345Z caller=main.go:66 msg="validating config" err="MULTIPLE CONFIG ERRORS FOUND, PLEASE READ CAREFULLY\nCONFIG ERROR: invalid compactor config: compactor.delete-request-store should be configured when retention is enabled\nCONFIG ERROR: schema v13 is required to store Structured Metadata and use native OTLP ingestion, your schema version is v11. Set `allow_structured_metadata: false` in the `limits_config` section or set the command line argument `-validation.allow-structured-metadata=false` and restart Loki. Then proceed to update to schema v13 or newer before re-enabling this config, search for 'Storage Schema' in the docs for the schema update procedure\nCONFIG ERROR: `tsdb` index type is required to store Structured Metadata and use native OTLP ingestion, your index type is `boltdb-shipper` (defined in the `store` parameter of the schema_config). Set `allow_structured_metadata: false` in the `limits_config` section or set the command line argument `-validation.allow-structured-metadata=false` and restart Loki. Then proceed to update the schema to use index type `tsdb` before re-enabling this config, search for 'Storage Schema' in the docs for the schema update procedure"
For full logs, run 'nix log /nix/store/rkmm8wa8vz576bhwpz0wwmv8ck31653j-validate-loki-conf.drv'.
Here’s the error in reformatted:
MULTIPLE CONFIG ERRORS FOUND, PLEASE READ CAREFULLY
CONFIG ERROR: invalid compactor config: compactor.delete-request-store should
be configured when retention is enabled
CONFIG ERROR: schema v13 is required to store Structured Metadata and use
native OTLP ingestion, your schema version is v11. Set
`allow_structured_metadata: false` in the `limits_config`
section or set the command line argument
`-validation.allow-structured-metadata=false` and restart Loki.
Then proceed to update to schema v13 or newer before re-enabling
this config, search for 'Storage Schema' in the docs for the
schema update procedure
CONFIG ERROR: `tsdb` index type is required to store Structured Metadata and
use native OTLP ingestion, your index type is `boltdb-shipper`
(defined in the `store` parameter of the schema_config). Set
`allow_structured_metadata: false` in the `limits_config` section
or set the command line argument
`-validation.allow-structured-metadata=false` and restart Loki.
Then proceed to update the schema to use index type `tsdb`
before re-enabling this config, search for 'Storage Schema'
in the docs for the schema update procedure
Okay, I’ve added allowed_structured_metadata= false;
now what will break next?
CONFIG ERROR: invalid compactor config: compactor.delete-request-store should
be configured when retention is enabled
OK, I’ve added delete_request_store = "filesystem";
, as per this GitHub
issue.
§It built!
Hooray! It built!
winston@silo ~/p/nixos-configs $ sudo env NIXPKGS_ALLOW_INSECURE=1 TMPDIR=/var/tmp nixos-rebuild boot --flake ~/p/nixos-configs# --impure
warning: Git tree '/home/winston/p/nixos-configs' is dirty
building the system configuration...
warning: Git tree '/home/winston/p/nixos-configs' is dirty
trace: warning: The option `services.nextcloud.extraOptions' defined in `/nix/store/d99kz3ifvz1hqg8wni0bi2j08n3rdisr-source/hosts/silo' has been renamed to `services.nextcloud.settings'.
trace: warning: The option `services.nextcloud.config.defaultPhoneRegion' defined in `/nix/store/d99kz3ifvz1hqg8wni0bi2j08n3rdisr-source/hosts/silo' has been renamed to `services.nextcloud.settings.default_phone_region'.
trace: warning: A legacy Nextcloud install (from before NixOS 24.05) may be installed.
After nextcloud27 is installed successfully, you can safely upgrade
to 28. The latest version available is Nextcloud29.
Please note that Nextcloud doesn't support upgrades across multiple major versions
(i.e. an upgrade from 16 is possible to 17, but not 16 to 18).
The package can be upgraded by explicitly declaring the service-option
`services.nextcloud.package`.
updating GRUB 2 menu...
installing the GRUB 2 boot loader on /dev/disk/by-id/ata-CT120BX500SSD1_1943E3D1AC4B...
Installing for i386-pc platform.
Installation finished. No error reported.
updating GRUB 2 menu...
installing the GRUB 2 boot loader on /dev/disk/by-id/ata-CT120BX500SSD1_1943E3D1AC45...
Installing for i386-pc platform.
Installation finished. No error reported.
(51s) winston@silo ~/p/nixos-configs $
Now to snapshot everything real quick…
winston@silo ~/p/nixos-configs $ sudo zfs snapshot -r naspool@2024-08-07_pre-reboot
winston@silo ~/p/nixos-configs $ sudo zfs snapshot -r rpool@2024-08-07_pre-reboot
Now let’s reboot and hope for the best.
§degraded - libvirt-guests.service
failure
systemctl status
says the state is “degraded”. Let’s see what broke:
winston@silo ~ $ systemctl list-units --failed
UNIT LOAD ACTIVE SUB DESCRIPTION
● libvirt-guests.service loaded failed failed libvirt guests suspend/resume service
Legend: LOAD → Reflects whether the unit definition was properly loaded.
ACTIVE → The high-level unit activation state, i.e. generalization of SUB.
SUB → The low-level unit activation state, values depend on unit type.
1 loaded units listed.
(34s) 3 winston@silo ~ $
OK let’s check the specific service:
winston@silo ~ $ systemctl status libvirt-guests.service
× libvirt-guests.service - libvirt guests suspend/resume service
Loaded: loaded (/etc/systemd/system/libvirt-guests.service; enabled; preset: enabled)
Drop-In: /nix/store/70x3p9hhrm202n3lfl1p79bv0h2c59zi-system-units/libvirt-guests.service.d
└─overrides.conf
Active: failed (Result: exit-code) since Mon 2024-07-08 18:38:25 CDT; 3min 11s ago
Docs: man:libvirt-guests(8)
https://libvirt.org/
Process: 2340 ExecStart=/nix/store/6a5lmp5p08n9qsfd0l9aqc7jhigm82j9-libvirt-10.0.0/libexec/libvirt-guests.sh start (code=exited, status=1/FAILURE)
Main PID: 2340 (code=exited, status=1/FAILURE)
IP: 0B in, 0B out
CPU: 184ms
Jul 08 18:38:17 silo systemd[1]: Starting libvirt guests suspend/resume service...
Jul 08 18:38:24 silo libvirt-guests.sh[3307]: Resuming guests on default URI...
Jul 08 18:38:25 silo libvirt-guests.sh[3313]: Resuming guest seedbox:
Jul 08 18:38:25 silo libvirt-guests.sh[3318]: error: Failed to start domain 'seedbox'
Jul 08 18:38:25 silo libvirt-guests.sh[3318]: error: operation failed: guest CPU doesn't match specification: extra features: vmx-ins-outs,vmx-true-ctls,vmx-store-lma,vmx-activity-hlt,vmx-vmwrite-vmexit-fields,vmx-apicv-xapic,vmx-ept,vmx-desc-exit,vmx-rdtscp-exit,vmx-apicv-x2apic,vmx-vpid,vmx-wbinvd-exit,vmx-unrestricted-guest,vmx-rdrand-exit,vmx-invpcid-exit,vmx-vmfunc,vmx-shadow-vmcs,vmx-invvpid,vmx-invvpid-single-addr,vmx-invvpid-all-context,vmx-ept-execonly,vmx-page-walk-4,vmx-ept-2mb,vmx-ept-1gb,vmx-invept,vmx-eptad,vmx-invept-single-context,vmx-invept-all-context,vmx-intr-exit,vmx-nmi-exit,vmx-vnmi,vmx-preemption-timer,vmx-vintr-pending,vmx-tsc-offset,vmx-hlt-exit,vmx-invlpg-exit,vmx-mwait-exit,vmx-rdpmc-exit,vmx-rdtsc-exit,vmx-cr3-load-noexit,vmx-cr3-store-noexit,vmx-cr8-load-exit,vmx-cr8-store-exit,vmx-flexpriority,vmx-vnmi-pending,vmx-movdr-exit,vmx-io-exit,vmx-io-bitmap,vmx-mtf,vmx-msr-bitmap,vmx-monitor-exit,vmx-pause-exit,vmx-secondary-ctls,vmx-exit-nosave-debugctl,vmx-exit-load-perf-global-ctrl,vmx-exit-ack-intr,vmx-exit-save-pat,vmx-exit-load-pat,vmx-exit-save-efer,vmx-exit-load-efer,vmx-exit-save-preemption-timer,vmx-entry-noload-debugctl,vmx-entry-ia32e-mode,vmx-entry-load-perf-global-ctrl,vmx-entry-load-pat,vmx-entry-load-efer,vmx-eptp-switching, missing features: vmx-apicv-register,vmx-apicv-vid,vmx-posted-intr
Jul 08 18:38:25 silo systemd[1]: libvirt-guests.service: Main process exited, code=exited, status=1/FAILURE
Jul 08 18:38:25 silo systemd[1]: libvirt-guests.service: Failed with result 'exit-code'.
Jul 08 18:38:25 silo systemd[1]: Failed to start libvirt guests suspend/resume service.
Oh snap, looks like my virtual machine (VM) for BitTorrent is broken. More specifically, libvirtd failed to resume the VM from saved state (it’s like folding your laptop shut, but for virtual machines). By the way, everything on archive.org is available via BitTorrent. Most Linux distros provide .torrent files too! Hosting torrents for these projects is one low-effort way to help out the community.
Here’s the error reformatted:
error: Failed to start domain 'seedbox'
error: operation failed: guest CPU doesn't match specification: extra features:
vmx-ins-outs,vmx-true-ctls,vmx-store-lma,vmx-activity-hlt,
vmx-vmwrite-vmexit-fields,vmx-apicv-xapic,vmx-ept,vmx-desc-exit,
vmx-rdtscp-exit,vmx-apicv-x2apic,vmx-vpid,vmx-wbinvd-exit,
vmx-unrestricted-guest,vmx-rdrand-exit,vmx-invpcid-exit,
vmx-vmfunc,vmx-shadow-vmcs,vmx-invvpid,vmx-invvpid-single-addr,
vmx-invvpid-all-context,vmx-ept-execonly,vmx-page-walk-4,vmx-ept-2mb,
vmx-ept-1gb,vmx-invept,vmx-eptad,vmx-invept-single-context,
vmx-invept-all-context,vmx-intr-exit,vmx-nmi-exit,vmx-vnmi,
vmx-preemption-timer,vmx-vintr-pending,vmx-tsc-offset,vmx-hlt-exit,
vmx-invlpg-exit,vmx-mwait-exit,vmx-rdpmc-exit,vmx-rdtsc-exit,
vmx-cr3-load-noexit,vmx-cr3-store-noexit,vmx-cr8-load-exit,
vmx-cr8-store-exit,vmx-flexpriority,vmx-vnmi-pending,vmx-movdr-exit,
vmx-io-exit,vmx-io-bitmap,vmx-mtf,vmx-msr-bitmap,vmx-monitor-exit,
vmx-pause-exit,vmx-secondary-ctls,vmx-exit-nosave-debugctl,
vmx-exit-load-perf-global-ctrl,vmx-exit-ack-intr,vmx-exit-save-pat,
vmx-exit-load-pat,vmx-exit-save-efer,vmx-exit-load-efer,
vmx-exit-save-preemption-timer,vmx-entry-noload-debugctl,
vmx-entry-ia32e-mode,vmx-entry-load-perf-global-ctrl,
vmx-entry-load-pat,vmx-entry-load-efer,vmx-eptp-switching,
missing features: vmx-apicv-register,vmx-apicv-vid,vmx-posted-intr
Whoops, after a quick google, I encountered this regression. If I run virsh start seedbox
, I get a similar error related to vmx. The solution is to
either upgrade to 10.2.0 or disable the vmx feature in the guest. vmx is used
for nested virtualization that which my guest does not utilize. Sidenote, vmx
support is detectable via CPUID which was the topic of my last article.
As I don’t use nested virtualization, I opted to disable it. I invoked virsh edit seedbox
to spawn a text editor with the libvirtd domain’s (guest’s) XML therein,
then edited <feature policy='require' name='vmx'/>
to <feature policy='disable' name='vmx'/>
. That <feature>
element can be found
nested within the <cpu>
element.
I couldn’t help notice had I built this host with Ubuntu, I wouldn’t had experienced this regression, because the updated QEMU has been shipped already. In fact, the maintainers marked this bug as Critical.
§OK am I secure now?
winston@silo ~/p/nixos-configs $ echo | nc localhost 22
SSH-2.0-OpenSSH_9.7
Invalid SSH identification string.
Uhhhhhhhhhhhhhhhhhhhhhhhhhh… what?
OK after reading the discourse, it sounds like they didn’t modify the version
string to indicate it has been fixed. If I run nix run nixpkgs#vulnix -- -R $(readlink -f $(which ssh))
, I get Found no advisories. Excellent!
. This
feels wrong and bad. Not a resounding “you’re safe!” and a spiffy new version
string to back up this claim. No, all I have instead is a random tool claiming
it’s all safe and secure. Trust me bro.
And don’t forget the icing on top of the cake: the nixpkgs team deviated from standard operating procedures to backport the security fix to 23.11, despite 23.11’s EOL (End of Life) status, and despite their insistence on everyone upgrade off EOL releases. Had I read one particular forum post, I would have known that it wasn’t necessary to upgrade!
I’m not out of the woods yet, that pesky Nextcloud upgrade is pending. I’ll be sure to share if I broke anything.
The fact remains, had I read the discourse very carefully, I wouldn’t had invested 6 hours today babysitting a NixOS upgrade, since 23.11 received the same OpenSSH vulnerability fix abject to the team’s standard operating procedures. I’ll leave it at that. I don’t want to be unduly negative, but that’s an opportunity cost on everything else one could do with every day’s sacred time.
The upgrade workflow was cool the first time, but at least one time a year, per machine? That sounds like a lot of toilsome drudgery. Plus one must face the inevitable bitrot caused by the behemoth codebase that nixpkgs has ballooned into, dragged down by a contribution system that permits thousands of tickets to go unsatisfied and hundreds of unmerged PRs to stagnate within its churn.
NixOS has its uses, and I believe my uses are too pedestrian for NixOS.