Previously I introduced the reader to ShellCheck. In this post I detail how I use Flycheck in Emacs and offer an Emacs function to automatically suppress Shellcheck errors at the current line.

I’m an avid Emacs user and it follows that I’ve set up editor customization to exude the most from ShellCheck. If you, dear reader, are not an Emacs user, I cannot help you! Please, for the love of shell scripts, ensure ShellCheck works within your preferred text editor, lest you wish to ship edgecased buggy scripts!

How to use Shellcheck in Emacs

First, ensure you have shellcheck installed. Check Repology for a list of distros and OSes that package Shellcheck. On Debian/Ubuntu try apt install shellcheck. Verify that shellcheck works by invoking shellcheck --version.

$ shellcheck --version || echo 'No shellcheck found :('
ShellCheck - shell script analysis tool
version: 0.10.0
license: GNU General Public License, version 3
website: https://www.shellcheck.net

Next — this is the only mandatory Emacs step — install flycheck. The simplest way is to Type M-x package-install RET flycheck RET followed by M-x global-flycheck-mode RET, though I personally employ use-package to declare my package usage up front. Here is what I have checked into my Emacs configuration git repository:

(use-package flycheck
  :ensure t
  :init
  (global-flycheck-mode 1))

After global-flycheck-mode is enabled, you’ll see errors underlined in prog-mode derived buffers. Try C-c ! l to list all errors in another buffer. Or C-c ! n or C-c ! p to go to next or previous error. See M-x describe-command RET flycheck-mode RET and its complimentary Flycheck documentation website for further learnings.

If you want to learn more about other keys in the C-c ! prefix from within Emacs, check out which-key for Emacs or read the flycheck.el sources (or try M-x describe-variable RET flycheck-mode-map RET from within Emacs).

But, ShellCheck likes to complain

Like all powerful tools, ShellCheck has its tradeoffs. It can save loads of time avoiding bugs in scripts later when already placed in production. On the other hand, ShellCheck can slow down development because one has to fix every single trifle to quell ShellCheck. A savvy shell scripter will find themself encountering Shellcheck errors that are intentional. Consider this Bash function that decrypts a pass managed login credential (source):

decrypt() {
    local file
    file="${PASSWORD_STORE_DIR}/${1}.gpg"
    if [[ ! -f $file ]]; then
        echo "No such entry \"${1}\"" >&2
        return 1
    fi
    # shellcheck disable=SC2086
    gpg $PASSWORD_STORE_GPG_OPTS --quiet --decrypt "$file"
}

Without the # shellcheck disable=... ignore directive, which itself is a comment (a line that begins with # [POSIX]​), ShellCheck unyieldingly complains about the unquoted use of $PASSWORD_STORE_GPG_OPTS.

Time and time again, I’ve forgotten the exact syntax of this ShellCheck directive. Moreover, resultant of my forgetfulness, I’ve devised a solution to automatically generate these comment lines in Emacs! Simply move the point to the line with a ShellCheck error (try C-c ! n or C-c ! p to find the next or previous error) then type C-c ! k to generate the necessary comment to shut up ShellCheck.

I’ve embedded the Elisp here for easy copy-paste and to discuss the anatomy of the code:

(require 'cl)                           ; for cl-loop
(require 'sh-script)                    ; For sh-mode-map

(defun winny--extract-shellcheck-error (err)
  (and-let* (((eq (flycheck-error-checker err) 'sh-shellcheck)))
    (flycheck-error-id err)))
(defun winny/shellcheck-disable-at-line ()
  "Insert \"# shellcheck disable=SC...\" line to silence shellcheck errors."
  (interactive)
  (save-match-data
    (save-excursion
      (and-let* ((errs
                  (cl-loop for err in (flycheck-overlay-errors-in (pos-bol) (pos-eol))
                           if (winny--extract-shellcheck-error err)
                           collect (winny--extract-shellcheck-error err))))
        (beginning-of-line)
        (insert (format "# shellcheck disable=%s"
                        (mapconcat 'identity errs ",")))
        (indent-according-to-mode)
        (newline-and-indent)))))
(add-hook 'sh-mode-hook
          (defun winny--bind-shellcheck-disable ()
            (define-key sh-mode-map (kbd "C-c ! k") 'winny/shellcheck-disable-at-line)))

As a sort of preamble, this code ensures cl-loop and sh-mode-map variables are in scope using the two require forms.

Then, this code defines a helper function winny--extract-shellcheck-error to determine if a flycheck-error object is a shellcheck error and return the SC prefixed identifier string. Next comes the interactive command winny/shellcheck-disable-at-line which searches for unaddressed ShellCheck errors on the current line then inserts a comment directive above the current line in order to silence those ShellCheck errors.

Finally, the add-hook function call adds a function named winny--bind-shellcheck-disable to execute whenever sh-mode is used. It has one job: bind winny/shellcheck-disable-at-line to a key.

Notice the usage of a defun instead of a lambda for add-hook’s second argument. I believe it prudent to name your anonymous functions such that the M-x describe-variable (C-h v) output is less busy and the code itself is accessible by name for further monkeying about with M-x eval-expression (M-:) and friends.

Conclusion

Equipped with this post, I hope the reader has ShellCheck set up in Emacs. If Emacs isn’t your prefererred editor, worry not, there is surely a blogpost or official documentation describing how to set up ShellCheck for your favorite editor!

Stay tuned for a post on how to set up ShellCheck with pre-commit to ensure code quality with team-based projects.

In this post I hope to convince the reader on the merits of ShellCheck. Stay tuned for more posts about using ShellCheck.

On Shellscripting

Shell scripting is a of passion of mine. Preferably Bash (here’s a guide). (POSIX sh a.k.a. Bourne shell works too, albeit with more effort thanks to diminished versatility when compared to Bash.) The shell scripting language family has many warts as the languages were designed for both real-time interaction and automation programming. Additionally, inextricable backwards compatibility requirements are continually placed on the shell scripting language family — many Unix heads expect their shell to execute POSIX sh scripts without a hitch. Bourne shell, the sh we love and know, was developed in the early 70s and coexistent firsts in human computer interfaces.

One of my favorite shell warts is emphasis of reusing the string datatype for all sorts of operations. This wart is addressed, mildly, with Bash’s addition of arrays and more (check out declare). As a result, much of shell scripting is focused on efficient text processing and the leaky abstractions that (mostly) “typeless” programming provides. For example, in order to perform arithmetic in sh, you can use a command such as a=$(( a * 2 )) to double the value stored in variable a.

Equipped with context, it might make a little more sense as we touch on the sheer volume of pitfalls that beget robust sh or Bash utilization. Consider the first one most script writers encounter - the infamous word split. Refer to the following shell snippet:

path='/my/path with spaces'
find $path -type f -name *.[ch]

Find will chime back with an error like:

find: ‘/my/path’: No such file or directory
find: ‘with’: No such file or directory
find: ‘spaces’: No such file or directory

Enter ShellCheck

ShellCheck is a linter for sh family languages. (A linter is a program that scans code for common mistakes.) I highly recommend it. Let’s see what the value it provides for this example:

ShellCheck catches the previous error as Double quote to prevent globbing and word splitting. [SC2086]. A brief search of SC2086 on the internet yields ShellCheck: SC2086 – Double quote to prevent globbing and word splitting. The solution here is to wrap $path in double quotes, so the script is improved to the following:

path='/my/path with spaces'
find "$path" -type f -name *.[ch]

But we’re not finished yet! Let’s say the current directory contains a file named hello.c, then the -name *.[ch] expands to -name hello.c. Woops! ShellCheck knows about this issue too: Quote the parameter to -name so the shell won't interpret it. [SC2061]. Read more here: ShellCheck: SC2061 – Quote the parameter to `-name`. The fix is the same, quote the parameter:

path='/my/path with spaces'
find "$path" -type f -name '*.[ch]'

Additional erroneous shell script examples are curated in ShellCheck’s Gallery of bad code.

Ways to use ShellCheck

In short, apt install it! There are many other ways to run ShellCheck.

  • Check the official Installing section within the ShellCheck README.
  • Install shellcheck via your package manager (check Repology aggregated packages).
  • pip3 install shellcheck-py to install shellcheck via a Python Package (GitHub repo)
  • Run shellcheck in an unofficial Docker container (Debian based) or in the official minimal container.
  • Run in CI/CD, maybe using pre-commit. (I will be detailing this workflow in another blog post. The Installing section within the ShellCheck README details one solution which I’ll compare against other more user friendly solutions.)
  • Copy/paste into shellcheck.net’s form to check in browser. Note: the checker runs on a server so the text you pasted is exfiltrated from your browser tab.

And last but not least, be sure to run ShellCheck in your text editor or IDE. Good ShellCheck integration checks as you edit shell scripts, so you can catch errors before even saving the file.

Suppress ShellCheck

ShellCheck offers a few different mechanisms to disable specific errors via environment variable, for the entire file, and for the next line of code. I usually reach for ignoring errors on a per-line basis:

  1. Insert a line before the offending line of the format # shellcheck disable=SC2116,SC2086
  2. ShellCheck will ignore SC2116 and SC2086 next time.
# shellcheck disable=SC2116,SC2086
hash=$(echo ${hash})    # trim spaces

(Code sample lifted from the ShellCheck wiki’s Ignore page.)

That’s it

There is little reason to not use ShellCheck. Improve the quality of your work today. Try out ShellCheck.

Stay tuned for additional posts related to ShellCheck usage.

P.S. If you really can’t use ShellCheck

If you can’t use ShellCheck, you can still validate that scripts parse correctly prior to execution — check out sh -n and bash -n. (See set -n in the POSIX sh docs and in the Bash infopages.) The -n flag will parse each line and inform you about parse errors, but not execute the parsed commands.

Update the NAS to 24.05

Updated Wednesday, Jul 24, 2024

Turns out my NAS is vulnerable to the SSH vulnerability which allows anyone to log into your host with enough time, guaranteed. Dubbed regreSSHion (CVE-2024-6387), it affects a host of different OpenSSH version ranges. If one has OpenSSH 9.8p1 or later, one is totally fine.

Unfortunately, the NAS is still on NixOS 23.11. The NAS remains on NixOS, but all my other devices have been migrated off to Debian Testing. In this brief post, I wanted to describe the toil involved with upgrading this NixOS powered NAS. Some of it is my own PEBKAC fault. Some of it is through well meaning defaults of NixOS. Some of it is just not enough attention to detail. Or as I’ve recently come to discern, too complicated or temperamental for my liking. Maybe this will be entertaining or maybe it will be cringe. My hope is this post encompasses the process of upgrading a NixOS host, warts and all. Sorry for my snide remarks; I’m over the promises of unfettered complexity.

Step 1: nix flake update

I haven’t updated the flakes containing both stable and unstable nixpkgs references in some time. So that’s my first step, before considering the upgrade to 24.04 in earnest. As a result, I need to git pull to ensure my repository is up to date. Whoops, a bit of PEBKAC broke git pull on my setup.

ssh agent mess

Been using keychain for sometime. Here’s the relevant code from my dotfiles (link).

# shellcheck disable=SC1090
[[ -r ${HOME}/.keychain/${HOSTNAME}-sh ]] && . "${HOME}/.keychain/${HOSTNAME}-sh"
# shellcheck disable=SC1090
[[ -r ${HOME}/.keychain/${HOSTNAME}-sh-gpg ]] && . "${HOME}/.keychain/${HOSTNAME}-sh-gpg"

# shellcheck disable=SC2086
eval "$(keychain --eval -q --inherit any --agents ssh,gpg ${keys//\~/$HOME})"

Unfortunately, it seems to break ssh agent forwarding. so I can’t git pull on my NAS without disabling this module. That’s the first friction of the day; getting the dang ole’ thing to git pull correctly. After removing this file from my .bashrc.d/, ssh agent forwarding worked again. That’s a problem for another day.

Okay! Next contention.

nixos-rebuild boot fails with no space left on device

I think this is a super frustrating issue that I’ve ran into on most hosts that I’ve deployed NixOS on. Because nix-build’s error reporting won’t tell you the directory it’s building in, it’s hard to know from the nix log ... incantation nor terminal output what’s the root cause of the error. It comes down the particular invocations in the derivation’s (package’s) build phase. Some will print out an absolute path, some won’t. This loosens the feedback loop… Now I have to monitor indirect metrics, such as disk usage or even strace something to see where it’s writing to.

Today, this costed me an hour of iterating with nixos-rebuild incantations in order to coax grafana to compile. For some reason, I assumed my ZFS Dataset named rpool — which contains /root, /home, and /var — might have been the culprit, so I invested an exorbitant amount of time reconfiguring Loki to clean up old log data, moved a handful of Syncthing folders into another ZFS dataset. As it turned out, that wasn’t the problem at all. A tighter feedback loop could have prevented time wastage here.

Another fix for this is to override the TMPDIR environment variable on the failing build. Or set it in your system.environment attribute set for permanent configuration. Next, you might need to reconfigure the size of /tmp if choosing TMPDIR=/tmp. It’s a tmpfs that resides in memory (and swap), its maximum size is configured to a percentage of your physical RAM. One can increase this size or migrate /tmp to a filesystem partition on disk. On low memory hosts it’s best off to eschew tmpfs for /tmp and instead set up a filesystem partition. Filesystem caching is pretty good nowadays so it’s not a serious slowdown as one might have tricked into believing.

Back to it: I tried sudo env TMPDIR=/tmp nixos-rebuild boot and lo and behold, the 4GiB tmpfs was insufficient to build Grafana. Shoot. Retried again for maybe the fifth time with sudo env TMPDIR=/var/tmp nixos-rebuild boot. Another thirty minutes later and the build finished successfully. /var/tmp typically lives either on your /var or / filesystem with plenty of space.

Now to reboot and ssh in.

ssh then zfs load-key funkiness

I have it in my notes how to boot my NixOS powered NAS because it’s not exactly natural nor intuitive. First I reboot, then I wait to ssh in, then issue a zfs load-key command to unlock the rpool which contains NixOS, then kill the other zfs load-key command kicked off by the boot sequence, thereby resuming the boot sequence.

winston@silo ~ $ sudo reboot
[sudo] password for winston:
(7s) winston@silo ~ $ # oh yeah, reboot doesn't work :(
winston@silo ~ $ sudo systemctl reboot

Broadcast message from root@silo on pts/8 (Mon 2024-07-08 17:29:16 CDT):

The system will reboot now!

winston@silo ~ $ Connection to silo closed by remote host.
Connection to silo closed.
(5m50s) 255 winston@quasit ~ $ sleep 120;ssh -p2222 root@silo
~ # zfs load-key rpool
Enter passphrase for 'rpool':
~ # pkill zfs  # Next type Return followed by ~. to kill the ssh session!
~ # Connection to silo closed.
(56s) 255 winston@quasit ~ $ sleep 60; ssh -A silo

                           +&-
     Welcome to           _.-^-._    .--.
        silo.          .-'   _   '-. |__|
                      /     |_|     \|  |
                     /               \  |
                    /|     _____     |\ |
                     |    |==|==|    |  |
 |---|---|---|---|---|    |--|--|    |  |
 |---|---|---|---|---|    |==|==|    |  |
^jgs^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Last login: Mon Jul  8 17:23:33 2024 from 10.20.1.44
winston@silo ~ $ systemctl status

After pkill zfs is ran, the ssh connection will become unresponsive. Type return followed by ~. to tell OpenSSH to exit.

Systemd status is “still starting”. So I’ll come back another minute or two, see if it’s happy.

Okay, now it says systemd state is “running” and not “degraded”. We’re in business!

Commit the changes

Now to commit the flake changes:

winston@silo ~/p/nixos-configs $ git commit -m 'flake: refresh'
.git/hooks/pre-commit: line 13: /nix/store/4zqn5lajki1z3a2avia658l1wacpi8v0-pre-commit-3.3.3/bin/pre-commit: No such file or directory
1 winston@silo ~/p/nixos-configs $ pre
precat      preunzip    prezip      prezip-bin
1 winston@silo ~/p/nixos-configs $ nix run nixpkgs#pre-commit install
pre-commit installed at .git/hooks/pre-commit
winston@silo ~/p/nixos-configs $ git commit -m 'flake: refresh'
Trim Trailing Whitespace.................................................Passed
Fix End of Files.........................................................Passed
Check Yaml...........................................(no files to check)Skipped
Check for added large files..............................................Passed
[master f0be2ea] flake: refresh
 2 files changed, 9 insertions(+), 8 deletions(-)

Yes, I know pre-commit should exist in my environment.systemPackages. I forgot why I didn’t add it. Maybe pre-commit is supposed to be in this folder’s shell.nix? Complicated. You can see why I’m shrinking away from nix-powered workflows! I just want the thing to do the thing.

Now, on to the upgrade to fix the OpenSSH vulnerability. A couple hours later…

Re-read the release notes

NixOS has this stance about upgrades not dissimilar to classical stable release distros. Things will break. Things will break in fantastic ways. just don’t upgrade, and things won’t break. It’s a white lie of NixOS: that somehow you enjoy stability between upgrades. You aren’t guaranteed stability between upgrades. You must possesses the same due diligence and best judgment that is expected on any other distro. Be sure to read the release notes, there is no time-efficient alternative. A poorly aged adage of NixOS is “roll back if upgrade breaks stuff”. No need to ensure most upgrades go well on first try if that’s the community spirit of releasing upgrades! And don’t get me started on what happens if you need security updates. This blog post happens — you must upgrade everything or suture nixpkgs in order to recoup security updates.

Back to it. As promised, I read the release notes and my biggest take away is that there is an upgrade to Loki 3.0.0. I’m okay with Loki breaking, so I won’t bother with it until I experience breakage. It appears that my Nextcloud is still supported, so I won’t struggle through a Nextcloud upgrade (yet).

Edit the flake inputs

All I did was modify inputs.nixpkgs.url to point to github:NixOS/nixpkgs/release-24.05. Then I ran nix flake update.

winston@silo ~/p/nixos-configs $ nix flake update
warning: Git tree '/home/winston/p/nixos-configs' is dirty
warning: updating lock file '/home/winston/p/nixos-configs/flake.lock':
• Updated input 'nixpkgs':
    'github:NixOS/nixpkgs/7144d6241f02d171d25fba3edeaf15e0f2592105' (2024-07-02)
  → 'github:NixOS/nixpkgs/de429c2a20520e0f81a1fd9d2677686a68cae739' (2024-07-08)
• Updated input 'unstable':
    'github:NixOS/nixpkgs/9f4128e00b0ae8ec65918efeba59db998750ead6' (2024-07-03)
  → 'github:NixOS/nixpkgs/655a58a72a6601292512670343087c2d75d859c1' (2024-07-08)
warning: Git tree '/home/winston/p/nixos-configs' is dirty

Next it’s time to try to build it, remembering to point TMPDIR to the spacious /var/tmp.

Breaking pineentry change

winston@silo ~/p/nixos-configs $ sudo env TMPDIR=/var/tmp nixos-rebuild boot --flake ~/p/nixos-configs# --impure
[sudo] password for winston:
warning: Git tree '/home/winston/p/nixos-configs' is dirty
building the system configuration...
warning: Git tree '/home/winston/p/nixos-configs' is dirty
error:
       … while calling the 'head' builtin

         at /nix/store/xyqsyg4nw57nbva6r339hf5223d0ar4r-source/lib/attrsets.nix:1575:11:

         1574|         || pred here (elemAt values 1) (head values) then
         1575|           head values
             |           ^
         1576|         else

       … while evaluating the attribute 'value'

         at /nix/store/xyqsyg4nw57nbva6r339hf5223d0ar4r-source/lib/modules.nix:809:9:

          808|     in warnDeprecation opt //
          809|       { value = builtins.addErrorContext "while evaluating the option `${showOption loc}':" value;
             |         ^
          810|         inherit (res.defsFinal') highestPrio;

       (stack trace truncated; use '--show-trace' to show the full trace)

       error:
       Failed assertions:
       - The option definition `programs.gnupg.agent.pinentryFlavor' in `/nix/store/g4py9g5mqznfzgn7nrz1glg811k9xpll-source/common/base.nix' no longer has any effect; please remove it.
       Use programs.gnupg.agent.pinentryPackage instead
       (28s) 1 winston@silo ~/p/nixos-configs $

Balls. Legacy code shims be damned, this is NixOS! I think I need to edit some configuration, because the powers that be decided it must be done. But where? Cool errors bro. No problem, can search my source tree and with any hope, it’s within my own Nix configuration, and not in another input (e.g. the nixpkgs flake):

winston@silo ~/p/nixos-configs $ rg pinentryFlavor
common/base.nix
54:    pinentryFlavor = "gtk2";

Oh maybe I could have taken the /nix/store/not-the-path-that-im-working-out-of-prefix-source/common/base.nix path suffix and visited the relative path common/base.nix. Icky to manually fix paths in error messages, but that should suffice next time!

Okay, deleted the offending line. I don’t know why I’m using gtk2 the flavor anymore, so maybe it’s not important… let’s pray that it wasn’t important!

winston@silo ~/p/nixos-configs $ sudo env TMPDIR=/var/tmp nixos-rebuild boot --flake ~/p/nixos-configs# --impure
[sudo] password for winston:
warning: Git tree '/home/winston/p/nixos-configs' is dirty
building the system configuration...
warning: Git tree '/home/winston/p/nixos-configs' is dirty
error:
       … while calling the 'head' builtin

         at /nix/store/xyqsyg4nw57nbva6r339hf5223d0ar4r-source/lib/attrsets.nix:1575:11:

         1574|         || pred here (elemAt values 1) (head values) then
         1575|           head values
             |           ^
         1576|         else

       … while evaluating the attribute 'value'

         at /nix/store/xyqsyg4nw57nbva6r339hf5223d0ar4r-source/lib/modules.nix:809:9:

          808|     in warnDeprecation opt //
          809|       { value = builtins.addErrorContext "while evaluating the option `${showOption loc}':" value;
             |         ^
          810|         inherit (res.defsFinal') highestPrio;

       (stack trace truncated; use '--show-trace' to show the full trace)

       error: Package ‘nextcloud-27.1.11’ in /nix/store/xyqsyg4nw57nbva6r339hf5223d0ar4r-source/pkgs/servers/nextcloud/default.nix:35 is marked as insecure, refusing to evaluate.


       Known issues:
        - Nextcloud version 27.1.11 is EOL

       You can install it anyway by allowing this package, using the
       following methods:

       a) To temporarily allow all insecure packages, you can use an environment
          variable for a single invocation of the nix tools:

            $ export NIXPKGS_ALLOW_INSECURE=1

          Note: When using `nix shell`, `nix build`, `nix develop`, etc with a flake,
                then pass `--impure` in order to allow use of environment variables.

       b) for `nixos-rebuild` you can add ‘nextcloud-27.1.11’ to
          `nixpkgs.config.permittedInsecurePackages` in the configuration.nix,
          like so:

            {
              nixpkgs.config.permittedInsecurePackages = [
                "nextcloud-27.1.11"
              ];
            }

       c) For `nix-env`, `nix-build`, `nix-shell` or any other Nix command you can add
          ‘nextcloud-27.1.11’ to `permittedInsecurePackages` in
          ~/.config/nixpkgs/config.nix, like so:

            {
              permittedInsecurePackages = [
                "nextcloud-27.1.11"
              ];
            }
(28s) 1 winston@silo ~/p/nixos-configs $

Oh crud, apparently I’m running an insecure Nextcloud? Who knew! Looks like one can set an environment variable (NIXPKGS_ALLOW_INSECURE=1) to tell nixos-rebuild to calm down if only a smidgen.

Loki changes

The release notes did mention Loki configuration changed and advised that users read the upstream Loki release notes too. It looks like I can’t push the Loki upgrade back and must handle it now:

winston@silo ~/p/nixos-configs $ sudo env NIXPKGS_ALLOW_INSECURE=1 TMPDIR=/var/tmp nixos-rebuild boot --flake ~/p/nixos-configs# --impure
warning: Git tree '/home/winston/p/nixos-configs' is dirty
building the system configuration...
warning: Git tree '/home/winston/p/nixos-configs' is dirty
error: builder for '/nix/store/v6g5b5c383a4h6i8bl210h91cp54qpz6-validate-loki-conf.drv' failed with exit code 1;
       last 5 log lines:
       > failed parsing config: /nix/store/3i7y6y5nwjqz8mhg1kakhmyd1cv7cy3i-loki-config.json: yaml: unmarshal errors:
       >   line 4: field max_look_back_period not found in type config.ChunkStoreConfig
       >   line 13: field shared_store not found in type compactor.Config
       >   line 31: field max_transfer_retries not found in type ingester.Config
       >   line 59: field shared_store not found in type boltdb.IndexCfg. Use `-config.expand-env=true` flag if you want to expand environment variables in your config file
       For full logs, run 'nix log /nix/store/v6g5b5c383a4h6i8bl210h91cp54qpz6-validate-loki-conf.drv'.
error: 1 dependencies of derivation '/nix/store/cxpdszil18wj6hb3az3jwqbyfy43wmj6-unit-loki.service.drv' failed to build
error: 1 dependencies of derivation '/nix/store/gyh8mr8sqk1gk20qldp062zym2mdy06c-system-units.drv' failed to build
error: 1 dependencies of derivation '/nix/store/qr88n0970ji783bkdwhk8ig7wazksa9v-etc.drv' failed to build
error: 1 dependencies of derivation '/nix/store/p36ja5jhi7wr2ayzxgf7ykkxhss599n2-nixos-system-silo-24.05.20240708.de429c2.drv' failed to build
(3m46s) 1 winston@silo ~/p/nixos-configs $

That list of Loki-specific configuration errors is extremely helpful. Well done. The nixos-rebuild process printed the problematic configuration fields out to the terminal. After trying to make heads or tails of the upgrade guide, I instead merely removed every offending line. Let’s see what works (shrug).

error: builder for '/nix/store/rkmm8wa8vz576bhwpz0wwmv8ck31653j-validate-loki-conf.drv' failed with exit code 1;
       last 1 log lines:
       > level=error ts=2024-07-08T23:14:46.302638345Z caller=main.go:66 msg="validating config" err="MULTIPLE CONFIG ERRORS FOUND, PLEASE READ CAREFULLY\nCONFIG ERROR: invalid compactor config: compactor.delete-request-store should be configured when retention is enabled\nCONFIG ERROR: schema v13 is required to store Structured Metadata and use native OTLP ingestion, your schema version is v11. Set `allow_structured_metadata: false` in the `limits_config` section or set the command line argument `-validation.allow-structured-metadata=false` and restart Loki. Then proceed to update to schema v13 or newer before re-enabling this config, search for 'Storage Schema' in the docs for the schema update procedure\nCONFIG ERROR: `tsdb` index type is required to store Structured Metadata and use native OTLP ingestion, your index type is `boltdb-shipper` (defined in the `store` parameter of the schema_config). Set `allow_structured_metadata: false` in the `limits_config` section or set the command line argument `-validation.allow-structured-metadata=false` and restart Loki. Then proceed to update the schema to use index type `tsdb` before re-enabling this config, search for 'Storage Schema' in the docs for the schema update procedure"
       For full logs, run 'nix log /nix/store/rkmm8wa8vz576bhwpz0wwmv8ck31653j-validate-loki-conf.drv'.

Here’s the error in reformatted:

MULTIPLE CONFIG ERRORS FOUND, PLEASE READ CAREFULLY
CONFIG ERROR: invalid compactor config: compactor.delete-request-store should
              be configured when retention is enabled
CONFIG ERROR: schema v13 is required to store Structured Metadata and use
              native OTLP ingestion, your schema version is v11. Set
              `allow_structured_metadata: false` in the `limits_config`
              section or set the command line argument
              `-validation.allow-structured-metadata=false` and restart Loki.
              Then proceed to update to schema v13 or newer before re-enabling
              this config, search for 'Storage Schema' in the docs for the
              schema update procedure
CONFIG ERROR: `tsdb` index type is required to store Structured Metadata and
              use native OTLP ingestion, your index type is `boltdb-shipper`
              (defined in the `store` parameter of the schema_config). Set
              `allow_structured_metadata: false` in the `limits_config` section
              or set the command line argument
              `-validation.allow-structured-metadata=false` and restart Loki.
              Then proceed to update the schema to use index type `tsdb`
              before re-enabling this config, search for 'Storage Schema'
              in the docs for the schema update procedure

Okay, I’ve added allowed_structured_metadata= false; now what will break next?

CONFIG ERROR: invalid compactor config: compactor.delete-request-store should
              be configured when retention is enabled

OK, I’ve added delete_request_store = "filesystem";, as per this GitHub issue.

It built!

Hooray! It built!

winston@silo ~/p/nixos-configs $ sudo env NIXPKGS_ALLOW_INSECURE=1 TMPDIR=/var/tmp nixos-rebuild boot --flake ~/p/nixos-configs# --impure
warning: Git tree '/home/winston/p/nixos-configs' is dirty
building the system configuration...
warning: Git tree '/home/winston/p/nixos-configs' is dirty
trace: warning: The option `services.nextcloud.extraOptions' defined in `/nix/store/d99kz3ifvz1hqg8wni0bi2j08n3rdisr-source/hosts/silo' has been renamed to `services.nextcloud.settings'.
trace: warning: The option `services.nextcloud.config.defaultPhoneRegion' defined in `/nix/store/d99kz3ifvz1hqg8wni0bi2j08n3rdisr-source/hosts/silo' has been renamed to `services.nextcloud.settings.default_phone_region'.
trace: warning: A legacy Nextcloud install (from before NixOS 24.05) may be installed.

After nextcloud27 is installed successfully, you can safely upgrade
to 28. The latest version available is Nextcloud29.

Please note that Nextcloud doesn't support upgrades across multiple major versions
(i.e. an upgrade from 16 is possible to 17, but not 16 to 18).

The package can be upgraded by explicitly declaring the service-option
`services.nextcloud.package`.

updating GRUB 2 menu...
installing the GRUB 2 boot loader on /dev/disk/by-id/ata-CT120BX500SSD1_1943E3D1AC4B...
Installing for i386-pc platform.
Installation finished. No error reported.
updating GRUB 2 menu...
installing the GRUB 2 boot loader on /dev/disk/by-id/ata-CT120BX500SSD1_1943E3D1AC45...
Installing for i386-pc platform.
Installation finished. No error reported.
(51s) winston@silo ~/p/nixos-configs $

Now to snapshot everything real quick…

winston@silo ~/p/nixos-configs $ sudo zfs snapshot -r naspool@2024-08-07_pre-reboot
winston@silo ~/p/nixos-configs $ sudo zfs snapshot -r rpool@2024-08-07_pre-reboot

Now let’s reboot and hope for the best.

degraded - libvirt-guests.service failure

systemctl status says the state is “degraded”. Let’s see what broke:

winston@silo ~ $ systemctl list-units --failed
  UNIT                   LOAD   ACTIVE SUB    DESCRIPTION
● libvirt-guests.service loaded failed failed libvirt guests suspend/resume service

Legend: LOAD   → Reflects whether the unit definition was properly loaded.
        ACTIVE → The high-level unit activation state, i.e. generalization of SUB.
        SUB    → The low-level unit activation state, values depend on unit type.

1 loaded units listed.
(34s) 3 winston@silo ~ $

OK let’s check the specific service:

winston@silo ~ $ systemctl status libvirt-guests.service
× libvirt-guests.service - libvirt guests suspend/resume service
     Loaded: loaded (/etc/systemd/system/libvirt-guests.service; enabled; preset: enabled)
    Drop-In: /nix/store/70x3p9hhrm202n3lfl1p79bv0h2c59zi-system-units/libvirt-guests.service.d
             └─overrides.conf
     Active: failed (Result: exit-code) since Mon 2024-07-08 18:38:25 CDT; 3min 11s ago
       Docs: man:libvirt-guests(8)
             https://libvirt.org/
    Process: 2340 ExecStart=/nix/store/6a5lmp5p08n9qsfd0l9aqc7jhigm82j9-libvirt-10.0.0/libexec/libvirt-guests.sh start (code=exited, status=1/FAILURE)
   Main PID: 2340 (code=exited, status=1/FAILURE)
         IP: 0B in, 0B out
        CPU: 184ms

Jul 08 18:38:17 silo systemd[1]: Starting libvirt guests suspend/resume service...
Jul 08 18:38:24 silo libvirt-guests.sh[3307]: Resuming guests on default URI...
Jul 08 18:38:25 silo libvirt-guests.sh[3313]: Resuming guest seedbox:
Jul 08 18:38:25 silo libvirt-guests.sh[3318]: error: Failed to start domain 'seedbox'
Jul 08 18:38:25 silo libvirt-guests.sh[3318]: error: operation failed: guest CPU doesn't match specification: extra features: vmx-ins-outs,vmx-true-ctls,vmx-store-lma,vmx-activity-hlt,vmx-vmwrite-vmexit-fields,vmx-apicv-xapic,vmx-ept,vmx-desc-exit,vmx-rdtscp-exit,vmx-apicv-x2apic,vmx-vpid,vmx-wbinvd-exit,vmx-unrestricted-guest,vmx-rdrand-exit,vmx-invpcid-exit,vmx-vmfunc,vmx-shadow-vmcs,vmx-invvpid,vmx-invvpid-single-addr,vmx-invvpid-all-context,vmx-ept-execonly,vmx-page-walk-4,vmx-ept-2mb,vmx-ept-1gb,vmx-invept,vmx-eptad,vmx-invept-single-context,vmx-invept-all-context,vmx-intr-exit,vmx-nmi-exit,vmx-vnmi,vmx-preemption-timer,vmx-vintr-pending,vmx-tsc-offset,vmx-hlt-exit,vmx-invlpg-exit,vmx-mwait-exit,vmx-rdpmc-exit,vmx-rdtsc-exit,vmx-cr3-load-noexit,vmx-cr3-store-noexit,vmx-cr8-load-exit,vmx-cr8-store-exit,vmx-flexpriority,vmx-vnmi-pending,vmx-movdr-exit,vmx-io-exit,vmx-io-bitmap,vmx-mtf,vmx-msr-bitmap,vmx-monitor-exit,vmx-pause-exit,vmx-secondary-ctls,vmx-exit-nosave-debugctl,vmx-exit-load-perf-global-ctrl,vmx-exit-ack-intr,vmx-exit-save-pat,vmx-exit-load-pat,vmx-exit-save-efer,vmx-exit-load-efer,vmx-exit-save-preemption-timer,vmx-entry-noload-debugctl,vmx-entry-ia32e-mode,vmx-entry-load-perf-global-ctrl,vmx-entry-load-pat,vmx-entry-load-efer,vmx-eptp-switching, missing features: vmx-apicv-register,vmx-apicv-vid,vmx-posted-intr
Jul 08 18:38:25 silo systemd[1]: libvirt-guests.service: Main process exited, code=exited, status=1/FAILURE
Jul 08 18:38:25 silo systemd[1]: libvirt-guests.service: Failed with result 'exit-code'.
Jul 08 18:38:25 silo systemd[1]: Failed to start libvirt guests suspend/resume service.

Oh snap, looks like my virtual machine (VM) for BitTorrent is broken. More specifically, libvirtd failed to resume the VM from saved state (it’s like folding your laptop shut, but for virtual machines). By the way, everything on archive.org is available via BitTorrent. Most Linux distros provide .torrent files too! Hosting torrents for these projects is one low-effort way to help out the community.

Here’s the error reformatted:

error: Failed to start domain 'seedbox'
error: operation failed: guest CPU doesn't match specification: extra features:
       vmx-ins-outs,vmx-true-ctls,vmx-store-lma,vmx-activity-hlt,
       vmx-vmwrite-vmexit-fields,vmx-apicv-xapic,vmx-ept,vmx-desc-exit,
       vmx-rdtscp-exit,vmx-apicv-x2apic,vmx-vpid,vmx-wbinvd-exit,
       vmx-unrestricted-guest,vmx-rdrand-exit,vmx-invpcid-exit,
       vmx-vmfunc,vmx-shadow-vmcs,vmx-invvpid,vmx-invvpid-single-addr,
       vmx-invvpid-all-context,vmx-ept-execonly,vmx-page-walk-4,vmx-ept-2mb,
       vmx-ept-1gb,vmx-invept,vmx-eptad,vmx-invept-single-context,
       vmx-invept-all-context,vmx-intr-exit,vmx-nmi-exit,vmx-vnmi,
       vmx-preemption-timer,vmx-vintr-pending,vmx-tsc-offset,vmx-hlt-exit,
       vmx-invlpg-exit,vmx-mwait-exit,vmx-rdpmc-exit,vmx-rdtsc-exit,
       vmx-cr3-load-noexit,vmx-cr3-store-noexit,vmx-cr8-load-exit,
       vmx-cr8-store-exit,vmx-flexpriority,vmx-vnmi-pending,vmx-movdr-exit,
       vmx-io-exit,vmx-io-bitmap,vmx-mtf,vmx-msr-bitmap,vmx-monitor-exit,
       vmx-pause-exit,vmx-secondary-ctls,vmx-exit-nosave-debugctl,
       vmx-exit-load-perf-global-ctrl,vmx-exit-ack-intr,vmx-exit-save-pat,
       vmx-exit-load-pat,vmx-exit-save-efer,vmx-exit-load-efer,
       vmx-exit-save-preemption-timer,vmx-entry-noload-debugctl,
       vmx-entry-ia32e-mode,vmx-entry-load-perf-global-ctrl,
       vmx-entry-load-pat,vmx-entry-load-efer,vmx-eptp-switching,
       missing features: vmx-apicv-register,vmx-apicv-vid,vmx-posted-intr

Whoops, after a quick google, I encountered this regression. If I run virsh start seedbox, I get a similar error related to vmx. The solution is to either upgrade to 10.2.0 or disable the vmx feature in the guest. vmx is used for nested virtualization that which my guest does not utilize. Sidenote, vmx support is detectable via CPUID which was the topic of my last article.

As I don’t use nested virtualization, I opted to disable it. I invoked virsh edit seedbox to spawn a text editor with the libvirtd domain’s (guest’s) XML therein, then edited <feature policy='require' name='vmx'/> to <feature policy='disable' name='vmx'/>. That <feature> element can be found nested within the <cpu> element.

I couldn’t help notice had I built this host with Ubuntu, I wouldn’t had experienced this regression, because the updated QEMU has been shipped already. In fact, the maintainers marked this bug as Critical.

OK am I secure now?

winston@silo ~/p/nixos-configs $ echo | nc localhost 22
SSH-2.0-OpenSSH_9.7
Invalid SSH identification string.

Uhhhhhhhhhhhhhhhhhhhhhhhhhh… what?

OK after reading the discourse, it sounds like they didn’t modify the version string to indicate it has been fixed. If I run nix run nixpkgs#vulnix -- -R $(readlink -f $(which ssh)), I get Found no advisories. Excellent!. This feels wrong and bad. Not a resounding “you’re safe!” and a spiffy new version string to back up this claim. No, all I have instead is a random tool claiming it’s all safe and secure. Trust me bro.

And don’t forget the icing on top of the cake: the nixpkgs team deviated from standard operating procedures to backport the security fix to 23.11, despite 23.11’s EOL (End of Life) status, and despite their insistence on everyone upgrade off EOL releases. Had I read one particular forum post, I would have known that it wasn’t necessary to upgrade!

I’m not out of the woods yet, that pesky Nextcloud upgrade is pending. I’ll be sure to share if I broke anything.

The fact remains, had I read the discourse very carefully, I wouldn’t had invested 6 hours today babysitting a NixOS upgrade, since 23.11 received the same OpenSSH vulnerability fix abject to the team’s standard operating procedures. I’ll leave it at that. I don’t want to be unduly negative, but that’s an opportunity cost on everything else one could do with every day’s sacred time.

The upgrade workflow was cool the first time, but at least one time a year, per machine? That sounds like a lot of toilsome drudgery. Plus one must face the inevitable bitrot caused by the behemoth codebase that nixpkgs has ballooned into, dragged down by a contribution system that permits thousands of tickets to go unsatisfied and hundreds of unmerged PRs to stagnate within its churn.

NixOS has its uses, and I believe my uses are too pedestrian for NixOS.