Previously I introduced the reader to ShellCheck. In this post I detail
how I use Flycheck in Emacs and offer an Emacs function to automatically
suppress Shellcheck errors at the current line.
I’m an avid Emacs user and it follows that I’ve set up editor customization to
exude the most from ShellCheck. If you, dear reader, are not an Emacs user, I
cannot help you! Please, for the love of shell scripts, ensure ShellCheck
works within your preferred text editor, lest you wish to ship edgecased buggy
scripts!
$ shellcheck --version || echo 'No shellcheck found :('
ShellCheck - shell script analysis tool
version: 0.10.0
license: GNU General Public License, version 3
website: https://www.shellcheck.net
Next — this is the only mandatory Emacs step — install flycheck. The
simplest way is to Type M-x package-install RET flycheck RET followed by M-x global-flycheck-mode RET, though I personally employ use-package to declare
my package usage up front. Here is what I have checked into my Emacs
configuration git repository:
(use-package flycheck
:ensure t :init
(global-flycheck-mode 1))
After global-flycheck-mode is enabled, you’ll see errors underlined in
prog-mode derived buffers. Try C-c ! l to list all errors in another
buffer. Or C-c ! n or C-c ! p to go to next or previous error. See M-x describe-command RET flycheck-mode RET and its complimentary Flycheck
documentation website for further learnings.
If you want to learn more about other keys in the C-c ! prefix from within
Emacs, check out which-key for Emacs or read the flycheck.el sources (or
try M-x describe-variable RET flycheck-mode-map RET from within Emacs).
But, ShellCheck likes to complain
Like all powerful tools, ShellCheck has its tradeoffs. It can save loads of
time avoiding bugs in scripts later when already placed in production. On the
other hand, ShellCheck can slow down development because one has to fix every
single trifle to quell ShellCheck. A savvy shell scripter will find themself
encountering Shellcheck errors that are intentional. Consider this Bash
function that decrypts a pass managed login credential (source):
decrypt(){ local file
file="${PASSWORD_STORE_DIR}/${1}.gpg"if[[ ! -f $file ]]; then echo "No such entry \"${1}\"" >&2return1fi# shellcheck disable=SC2086 gpg $PASSWORD_STORE_GPG_OPTS --quiet --decrypt "$file"}
Without the # shellcheck disable=...ignore directive, which itself is a
comment (a line that begins with #[POSIX]), ShellCheck unyieldingly
complains about the unquoted use of $PASSWORD_STORE_GPG_OPTS.
Time and time again, I’ve forgotten the exact syntax of this ShellCheck
directive. Moreover, resultant of my forgetfulness, I’ve devised a solution to
automatically generate these comment lines in Emacs! Simply move the point to
the line with a ShellCheck error (try C-c ! n or C-c ! p to find the next
or previous error) then type C-c ! k to generate the necessary comment to
shut up ShellCheck.
I’ve embedded the Elisp here for easy copy-paste and to discuss the anatomy of
the code:
(require 'cl) ; for cl-loop(require 'sh-script) ; For sh-mode-map(defun winny--extract-shellcheck-error (err)
(and-let* (((eq (flycheck-error-checker err) 'sh-shellcheck)))
(flycheck-error-id err)))
(defun winny/shellcheck-disable-at-line ()
"Insert \"# shellcheck disable=SC...\" line to silence shellcheck errors." (interactive)
(save-match-data
(save-excursion
(and-let* ((errs
(cl-loop for err in (flycheck-overlay-errors-in (pos-bol) (pos-eol))
if (winny--extract-shellcheck-error err)
collect (winny--extract-shellcheck-error err))))
(beginning-of-line)
(insert (format"# shellcheck disable=%s" (mapconcat'identity errs ",")))
(indent-according-to-mode)
(newline-and-indent)))))
(add-hook 'sh-mode-hook (defun winny--bind-shellcheck-disable ()
(define-key sh-mode-map (kbd "C-c ! k") 'winny/shellcheck-disable-at-line)))
As a sort of preamble, this code ensures cl-loop and sh-mode-map variables
are in scope using the two require forms.
Then, this code defines a helper function winny--extract-shellcheck-error to
determine if a flycheck-error object is a shellcheck error and return the
SC prefixed identifier string. Next comes the interactive commandwinny/shellcheck-disable-at-line which searches for unaddressed ShellCheck
errors on the current line then inserts a comment directive above the current
line in order to silence those ShellCheck errors.
Finally, the add-hook function call adds a function named
winny--bind-shellcheck-disable to execute whenever sh-mode is used. It has
one job: bind winny/shellcheck-disable-at-line to a key.
Notice the usage of a defun instead of a lambda for add-hook’s second
argument. I believe it prudent to name your anonymous functions such that the
M-x describe-variable (C-h v) output is less busy and the code itself is
accessible by name for further monkeying about with M-x eval-expression
(M-:) andfriends.
Conclusion
Equipped with this post, I hope the reader has ShellCheck set up in Emacs. If
Emacs isn’t your prefererred editor, worry not, there is surely a blogpost or
official documentation describing how to set up ShellCheck for your favorite
editor!
Stay tuned for a post on how to set up ShellCheck with pre-commit to ensure
code quality with team-based projects.
In this post I hope to convince the reader on the merits of ShellCheck. Stay
tuned for more posts about using ShellCheck.
On Shellscripting
Shell scripting is a of passion of mine. Preferably Bash (here’s a guide).
(POSIX sh a.k.a. Bourne shell works too, albeit with more effort thanks to
diminished versatility when compared to Bash.) The shell scripting language
family has many warts as the languages were designed for both real-time
interaction and automation programming. Additionally, inextricable backwards
compatibility requirements are continually placed on the shell scripting
language family — many Unix heads expect their shell to execute POSIX sh
scripts without a hitch. Bourne shell, the sh we love and know, was
developed in the early 70s and coexistent firsts in human computer interfaces.
One of my favorite shell warts is emphasis of reusing the string datatype for
all sorts of operations. This wart is addressed, mildly, with Bash’s addition
of arrays and more (check out declare). As a result, much of shell scripting
is focused on efficient text processing and the leaky abstractions that
(mostly) “typeless” programming provides. For example, in order to perform
arithmetic in sh, you can use a command such as a=$(( a * 2 )) to double
the value stored in variable a.
Equipped with context, it might make a little more sense as we touch on the
sheer volume of pitfalls that beget robust sh or Bash utilization. Consider
the first one most script writers encounter - the infamous word split. Refer
to the following shell snippet:
path='/my/path with spaces'find $path -type f -name *.[ch]
Find will chime back with an error like:
find: ‘/my/path’: No such file or directory
find: ‘with’: No such file or directory
find: ‘spaces’: No such file or directory
Enter ShellCheck
ShellCheck is a linter for sh family languages. (A linter is a program that
scans code for common mistakes.) I highly recommend it. Let’s see what the
value it provides for this example:
ShellCheck catches the previous error as Double quote to prevent globbing and word splitting. [SC2086]. A brief search of SC2086 on the internet yields
ShellCheck: SC2086 – Double quote to prevent globbing and word splitting. The
solution here is to wrap $path in double quotes, so the script is improved to
the following:
path='/my/path with spaces'find "$path" -type f -name *.[ch]
But we’re not finished yet! Let’s say the current directory contains a file
named hello.c, then the -name *.[ch] expands to -name hello.c. Woops!
ShellCheck knows about this issue too: Quote the parameter to -name so the shell won't interpret it. [SC2061]. Read more here: ShellCheck: SC2061 –
Quote the parameter to `-name`. The fix is the same, quote the parameter:
path='/my/path with spaces'find "$path" -type f -name '*.[ch]'
Additional erroneous shell script examples are curated in ShellCheck’s Gallery
of bad code.
Ways to use ShellCheck
In short, apt install it! There are many other ways to run ShellCheck.
Check the official Installing section within the ShellCheck README.
Run in CI/CD, maybe using pre-commit. (I will be detailing this workflow
in another blog post. The Installing section within the ShellCheck README
details one solution which I’ll compare against other more user friendly
solutions.)
Copy/paste into shellcheck.net’s form to check in browser. Note: the checker
runs on a server so the text you pasted is exfiltrated from your browser tab.
And last but not least, be sure to run ShellCheck in your text editor or IDE.
Good ShellCheck integration checks as you edit shell scripts, so you can catch
errors before even saving the file.
Suppress ShellCheck
ShellCheck offers a few different mechanisms to disable specific errors via
environment variable, for the entire file, and for the next line of code. I
usually reach for ignoring errors on a per-line basis:
Insert a line before the offending line of the format # shellcheck disable=SC2116,SC2086
ShellCheck will ignore SC2116 and SC2086 next time.
# shellcheck disable=SC2116,SC2086hash=$(echo ${hash})# trim spaces
There is little reason to not use ShellCheck. Improve the quality of your work
today. Try out ShellCheck.
Stay tuned for additional posts related to ShellCheck usage.
P.S. If you really can’t use ShellCheck
If you can’t use ShellCheck, you can still validate that scripts parse
correctly prior to execution — check out sh -n and bash -n. (See set -n in the POSIX sh docs and in the Bash infopages.) The -n flag will parse
each line and inform you about parse errors, but not execute the parsed
commands.
Turns out my NAS is vulnerable to the SSH vulnerability which allows anyone to
log into your host with enough time, guaranteed. Dubbed regreSSHion
(CVE-2024-6387), it affects a host of different OpenSSH version ranges.
If one has OpenSSH 9.8p1 or later, one is totally fine.
Unfortunately, the NAS is still on NixOS 23.11. The NAS remains on NixOS, but
all my other devices have been migrated off to Debian Testing. In this brief
post, I wanted to describe the toil involved with upgrading this NixOS powered
NAS. Some of it is my own PEBKAC fault. Some of it is through well meaning
defaults of NixOS. Some of it is just not enough attention to detail. Or as
I’ve recently come to discern, too complicated or temperamental for my liking.
Maybe this will be entertaining or maybe it will be cringe. My hope is this
post encompasses the process of upgrading a NixOS host, warts and all. Sorry
for my snide remarks; I’m over the promises of unfettered complexity.
Step 1: nix flake update
I haven’t updated the flakes containing both stable and unstablenixpkgs
references in some time. So that’s my first step, before considering the
upgrade to 24.04 in earnest. As a result, I need to git pull to ensure my
repository is up to date. Whoops, a bit of PEBKAC broke git pull on my
setup.
ssh agent mess
Been using keychain for sometime. Here’s the relevant code from my dotfiles
(link).
Unfortunately, it seems to break ssh agent forwarding. so I can’t git pull
on my NAS without disabling this module. That’s the first friction of the day;
getting the dang ole’ thing to git pull correctly. After removing this file
from my .bashrc.d/, ssh agent forwarding worked again. That’s a problem for
another day.
Okay! Next contention.
nixos-rebuild boot fails with no space left on device
I think this is a super frustrating issue that I’ve ran into on most hosts that
I’ve deployed NixOS on. Because nix-build’s error reporting won’t tell you
the directory it’s building in, it’s hard to know from the nix log ...
incantation nor terminal output what’s the root cause of the error. It comes
down the particular invocations in the derivation’s (package’s) build phase.
Some will print out an absolute path, some won’t. This loosens the feedback
loop… Now I have to monitor indirect metrics, such as disk usage or even
strace something to see where it’s writing to.
Today, this costed me an hour of iterating with nixos-rebuild incantations in
order to coax grafana to compile. For some reason, I assumed my ZFS Dataset
named rpool — which contains /root, /home, and /var — might have
been the culprit, so I invested an exorbitant amount of time reconfiguring Loki
to clean up old log data, moved a handful of Syncthing folders into another ZFS
dataset. As it turned out, that wasn’t the problem at all. A tighter feedback
loop could have prevented time wastage here.
Another fix for this is to override the TMPDIR environment variable on the
failing build. Or set it in your system.environment attribute set for
permanent configuration. Next, you might need to reconfigure the size of
/tmp if choosing TMPDIR=/tmp. It’s a tmpfs that resides in memory (and
swap), its maximum size is configured to a percentage of your physical RAM.
One can increase this size or migrate /tmp to a filesystem partition on disk.
On low memory hosts it’s best off to eschew tmpfs for /tmp and instead set up
a filesystem partition. Filesystem caching is pretty good nowadays so it’s not
a serious slowdown as one might have tricked into believing.
Back to it: I tried sudo env TMPDIR=/tmp nixos-rebuild boot and lo and
behold, the 4GiB tmpfs was insufficient to build Grafana. Shoot. Retried
again for maybe the fifth time with sudo env TMPDIR=/var/tmp nixos-rebuild boot. Another thirty minutes later and the build finished successfully.
/var/tmp typically lives either on your /var or / filesystem with plenty
of space.
Now to reboot and ssh in.
ssh then zfs load-key funkiness
I have it in my notes how to boot my NixOS powered NAS because it’s not exactly
natural nor intuitive. First I reboot, then I wait to ssh in, then issue a
zfs load-key command to unlock the rpool which contains NixOS, then kill
the other zfs load-key command kicked off by the boot sequence, thereby
resuming the boot sequence.
winston@silo ~ $ sudo reboot
[sudo] password for winston:
(7s) winston@silo ~ $ # oh yeah, reboot doesn't work :(
winston@silo ~ $ sudo systemctl reboot
Broadcast message from root@silo on pts/8 (Mon 2024-07-08 17:29:16 CDT):
The system will reboot now!
winston@silo ~ $ Connection to silo closed by remote host.
Connection to silo closed.
(5m50s) 255 winston@quasit ~ $ sleep 120;ssh -p2222 root@silo
~ # zfs load-key rpool
Enter passphrase for 'rpool':
~ # pkill zfs # Next type Return followed by ~. to kill the ssh session!
~ # Connection to silo closed.
(56s) 255 winston@quasit ~ $ sleep 60; ssh -A silo
+&-
Welcome to _.-^-._ .--.
silo. .-' _ '-. |__|
/ |_| \| |
/ \ |
/| _____ |\ |
| |==|==| | |
|---|---|---|---|---| |--|--| | |
|---|---|---|---|---| |==|==| | |
^jgs^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Last login: Mon Jul 8 17:23:33 2024 from 10.20.1.44
winston@silo ~ $ systemctl status
After pkill zfs is ran, the ssh connection will become unresponsive. Type
return followed by ~. to tell OpenSSH to exit.
Systemd status is “still starting”. So I’ll come back another minute or two,
see if it’s happy.
Okay, now it says systemd state is “running” and not “degraded”. We’re in
business!
Commit the changes
Now to commit the flake changes:
winston@silo ~/p/nixos-configs $ git commit -m 'flake: refresh'
.git/hooks/pre-commit: line 13: /nix/store/4zqn5lajki1z3a2avia658l1wacpi8v0-pre-commit-3.3.3/bin/pre-commit: No such file or directory
1 winston@silo ~/p/nixos-configs $ pre
precat preunzip prezip prezip-bin
1 winston@silo ~/p/nixos-configs $ nix run nixpkgs#pre-commit install
pre-commit installed at .git/hooks/pre-commit
winston@silo ~/p/nixos-configs $ git commit -m 'flake: refresh'
Trim Trailing Whitespace.................................................Passed
Fix End of Files.........................................................Passed
Check Yaml...........................................(no files to check)Skipped
Check for added large files..............................................Passed
[master f0be2ea] flake: refresh
2 files changed, 9 insertions(+), 8 deletions(-)
Yes, I know pre-commit should exist in my environment.systemPackages. I
forgot why I didn’t add it. Maybe pre-commit is supposed to be in this
folder’s shell.nix? Complicated. You can see why I’m shrinking away from
nix-powered workflows! I just want the thing to do the thing.
Now, on to the upgrade to fix the OpenSSH vulnerability. A couple hours later…
Re-read the release notes
NixOS has this stance about upgrades not dissimilar to classical stable release
distros. Things will break. Things will break in fantastic ways. just don’t
upgrade, and things won’t break. It’s a white lie of NixOS: that somehow you
enjoy stability between upgrades. You aren’t guaranteed stability between
upgrades. You must possesses the same due diligence and best judgment that is
expected on any other distro. Be sure to read the release notes, there is no
time-efficient alternative. A poorly aged adage of NixOS is “roll back if
upgrade breaks stuff”. No need to ensure most upgrades go well on first try if
that’s the community spirit of releasing upgrades! And don’t get me started on
what happens if you need security updates. This blog post happens — you must
upgrade everything or suture nixpkgs in order to recoup security updates.
Back to it. As promised, I read the release notes and my biggest take away is
that there is an upgrade to Loki 3.0.0. I’m okay with Loki breaking, so I
won’t bother with it until I experience breakage. It appears that my Nextcloud is
still supported, so I won’t struggle through a Nextcloud upgrade (yet).
Edit the flake inputs
All I did was modify inputs.nixpkgs.url to point to
github:NixOS/nixpkgs/release-24.05. Then I ran nix flake update.
Next it’s time to try to build it, remembering to point TMPDIR to the
spacious /var/tmp.
Breaking pineentry change
winston@silo ~/p/nixos-configs $ sudo env TMPDIR=/var/tmp nixos-rebuild boot --flake ~/p/nixos-configs# --impure
[sudo] password for winston:
warning: Git tree '/home/winston/p/nixos-configs' is dirty
building the system configuration...
warning: Git tree '/home/winston/p/nixos-configs' is dirty
error:
… while calling the 'head' builtin
at /nix/store/xyqsyg4nw57nbva6r339hf5223d0ar4r-source/lib/attrsets.nix:1575:11:
1574| || pred here (elemAt values 1) (head values) then
1575| head values
| ^
1576| else
… while evaluating the attribute 'value'
at /nix/store/xyqsyg4nw57nbva6r339hf5223d0ar4r-source/lib/modules.nix:809:9:
808| in warnDeprecation opt //
809| { value = builtins.addErrorContext "while evaluating the option `${showOption loc}':" value;
| ^
810| inherit (res.defsFinal') highestPrio;
(stack trace truncated; use '--show-trace' to show the full trace)
error:
Failed assertions:
- The option definition `programs.gnupg.agent.pinentryFlavor' in `/nix/store/g4py9g5mqznfzgn7nrz1glg811k9xpll-source/common/base.nix' no longer has any effect; please remove it.
Use programs.gnupg.agent.pinentryPackage instead
(28s) 1 winston@silo ~/p/nixos-configs $
Balls. Legacy code shims be damned, this is NixOS! I think I need to edit
some configuration, because the powers that be decided it must be done. But
where? Cool errors bro. No problem, can search my source tree and with any
hope, it’s within my own Nix configuration, and not in another input (e.g. the
nixpkgs flake):
Oh maybe I could have taken the
/nix/store/not-the-path-that-im-working-out-of-prefix-source/common/base.nix
path suffix and visited the relative path common/base.nix. Icky to manually
fix paths in error messages, but that should suffice next time!
Okay, deleted the offending line. I don’t know why I’m using gtk2 the flavor
anymore, so maybe it’s not important… let’s pray that it wasn’t important!
winston@silo ~/p/nixos-configs $ sudo env TMPDIR=/var/tmp nixos-rebuild boot --flake ~/p/nixos-configs# --impure
[sudo] password for winston:
warning: Git tree '/home/winston/p/nixos-configs' is dirty
building the system configuration...
warning: Git tree '/home/winston/p/nixos-configs' is dirty
error:
… while calling the 'head' builtin
at /nix/store/xyqsyg4nw57nbva6r339hf5223d0ar4r-source/lib/attrsets.nix:1575:11:
1574| || pred here (elemAt values 1) (head values) then
1575| head values
| ^
1576| else
… while evaluating the attribute 'value'
at /nix/store/xyqsyg4nw57nbva6r339hf5223d0ar4r-source/lib/modules.nix:809:9:
808| in warnDeprecation opt //
809| { value = builtins.addErrorContext "while evaluating the option `${showOption loc}':" value;
| ^
810| inherit (res.defsFinal') highestPrio;
(stack trace truncated; use '--show-trace' to show the full trace)
error: Package ‘nextcloud-27.1.11’ in /nix/store/xyqsyg4nw57nbva6r339hf5223d0ar4r-source/pkgs/servers/nextcloud/default.nix:35 is marked as insecure, refusing to evaluate.
Known issues:
- Nextcloud version 27.1.11 is EOL
You can install it anyway by allowing this package, using the
following methods:
a) To temporarily allow all insecure packages, you can use an environment
variable for a single invocation of the nix tools:
$ export NIXPKGS_ALLOW_INSECURE=1
Note: When using `nix shell`, `nix build`, `nix develop`, etc with a flake,
then pass `--impure` in order to allow use of environment variables.
b) for `nixos-rebuild` you can add ‘nextcloud-27.1.11’ to
`nixpkgs.config.permittedInsecurePackages` in the configuration.nix,
like so:
{
nixpkgs.config.permittedInsecurePackages = [
"nextcloud-27.1.11"
];
}
c) For `nix-env`, `nix-build`, `nix-shell` or any other Nix command you can add
‘nextcloud-27.1.11’ to `permittedInsecurePackages` in
~/.config/nixpkgs/config.nix, like so:
{
permittedInsecurePackages = [
"nextcloud-27.1.11"
];
}
(28s) 1 winston@silo ~/p/nixos-configs $
Oh crud, apparently I’m running an insecure Nextcloud? Who knew! Looks like
one can set an environment variable (NIXPKGS_ALLOW_INSECURE=1) to tell
nixos-rebuild to calm down if only a smidgen.
Loki changes
The release notes did mention Loki configuration changed and advised that users
read the upstream Loki release notes too. It looks like I can’t push the Loki
upgrade back and must handle it now:
winston@silo ~/p/nixos-configs $ sudo env NIXPKGS_ALLOW_INSECURE=1 TMPDIR=/var/tmp nixos-rebuild boot --flake ~/p/nixos-configs# --impure
warning: Git tree '/home/winston/p/nixos-configs' is dirty
building the system configuration...
warning: Git tree '/home/winston/p/nixos-configs' is dirty
error: builder for '/nix/store/v6g5b5c383a4h6i8bl210h91cp54qpz6-validate-loki-conf.drv' failed with exit code 1;
last 5 log lines:
> failed parsing config: /nix/store/3i7y6y5nwjqz8mhg1kakhmyd1cv7cy3i-loki-config.json: yaml: unmarshal errors:
> line 4: field max_look_back_period not found in type config.ChunkStoreConfig
> line 13: field shared_store not found in type compactor.Config
> line 31: field max_transfer_retries not found in type ingester.Config
> line 59: field shared_store not found in type boltdb.IndexCfg. Use `-config.expand-env=true` flag if you want to expand environment variables in your config file
For full logs, run 'nix log /nix/store/v6g5b5c383a4h6i8bl210h91cp54qpz6-validate-loki-conf.drv'.
error: 1 dependencies of derivation '/nix/store/cxpdszil18wj6hb3az3jwqbyfy43wmj6-unit-loki.service.drv' failed to build
error: 1 dependencies of derivation '/nix/store/gyh8mr8sqk1gk20qldp062zym2mdy06c-system-units.drv' failed to build
error: 1 dependencies of derivation '/nix/store/qr88n0970ji783bkdwhk8ig7wazksa9v-etc.drv' failed to build
error: 1 dependencies of derivation '/nix/store/p36ja5jhi7wr2ayzxgf7ykkxhss599n2-nixos-system-silo-24.05.20240708.de429c2.drv' failed to build
(3m46s) 1 winston@silo ~/p/nixos-configs $
That list of Loki-specific configuration errors is extremely helpful. Well
done. The nixos-rebuild process printed the problematic configuration fields
out to the terminal. After trying to make heads or tails of the upgrade guide,
I instead merely removed every offending line. Let’s see what works (shrug).
error: builder for '/nix/store/rkmm8wa8vz576bhwpz0wwmv8ck31653j-validate-loki-conf.drv' failed with exit code 1;
last 1 log lines:
> level=error ts=2024-07-08T23:14:46.302638345Z caller=main.go:66 msg="validating config" err="MULTIPLE CONFIG ERRORS FOUND, PLEASE READ CAREFULLY\nCONFIG ERROR: invalid compactor config: compactor.delete-request-store should be configured when retention is enabled\nCONFIG ERROR: schema v13 is required to store Structured Metadata and use native OTLP ingestion, your schema version is v11. Set `allow_structured_metadata: false` in the `limits_config` section or set the command line argument `-validation.allow-structured-metadata=false` and restart Loki. Then proceed to update to schema v13 or newer before re-enabling this config, search for 'Storage Schema' in the docs for the schema update procedure\nCONFIG ERROR: `tsdb` index type is required to store Structured Metadata and use native OTLP ingestion, your index type is `boltdb-shipper` (defined in the `store` parameter of the schema_config). Set `allow_structured_metadata: false` in the `limits_config` section or set the command line argument `-validation.allow-structured-metadata=false` and restart Loki. Then proceed to update the schema to use index type `tsdb` before re-enabling this config, search for 'Storage Schema' in the docs for the schema update procedure"
For full logs, run 'nix log /nix/store/rkmm8wa8vz576bhwpz0wwmv8ck31653j-validate-loki-conf.drv'.
Here’s the error in reformatted:
MULTIPLE CONFIG ERRORS FOUND, PLEASE READ CAREFULLY
CONFIG ERROR: invalid compactor config: compactor.delete-request-store should
be configured when retention is enabled
CONFIG ERROR: schema v13 is required to store Structured Metadata and use
native OTLP ingestion, your schema version is v11. Set
`allow_structured_metadata: false` in the `limits_config`
section or set the command line argument
`-validation.allow-structured-metadata=false` and restart Loki.
Then proceed to update to schema v13 or newer before re-enabling
this config, search for 'Storage Schema' in the docs for the
schema update procedure
CONFIG ERROR: `tsdb` index type is required to store Structured Metadata and
use native OTLP ingestion, your index type is `boltdb-shipper`
(defined in the `store` parameter of the schema_config). Set
`allow_structured_metadata: false` in the `limits_config` section
or set the command line argument
`-validation.allow-structured-metadata=false` and restart Loki.
Then proceed to update the schema to use index type `tsdb`
before re-enabling this config, search for 'Storage Schema'
in the docs for the schema update procedure
Okay, I’ve added allowed_structured_metadata= false; now what will break next?
CONFIG ERROR: invalid compactor config: compactor.delete-request-store should
be configured when retention is enabled
OK, I’ve added delete_request_store = "filesystem";, as per this GitHub
issue.
It built!
Hooray! It built!
winston@silo ~/p/nixos-configs $ sudo env NIXPKGS_ALLOW_INSECURE=1 TMPDIR=/var/tmp nixos-rebuild boot --flake ~/p/nixos-configs# --impure
warning: Git tree '/home/winston/p/nixos-configs' is dirty
building the system configuration...
warning: Git tree '/home/winston/p/nixos-configs' is dirty
trace: warning: The option `services.nextcloud.extraOptions' defined in `/nix/store/d99kz3ifvz1hqg8wni0bi2j08n3rdisr-source/hosts/silo' has been renamed to `services.nextcloud.settings'.
trace: warning: The option `services.nextcloud.config.defaultPhoneRegion' defined in `/nix/store/d99kz3ifvz1hqg8wni0bi2j08n3rdisr-source/hosts/silo' has been renamed to `services.nextcloud.settings.default_phone_region'.
trace: warning: A legacy Nextcloud install (from before NixOS 24.05) may be installed.
After nextcloud27 is installed successfully, you can safely upgrade
to 28. The latest version available is Nextcloud29.
Please note that Nextcloud doesn't support upgrades across multiple major versions
(i.e. an upgrade from 16 is possible to 17, but not 16 to 18).
The package can be upgraded by explicitly declaring the service-option
`services.nextcloud.package`.
updating GRUB 2 menu...
installing the GRUB 2 boot loader on /dev/disk/by-id/ata-CT120BX500SSD1_1943E3D1AC4B...
Installing for i386-pc platform.
Installation finished. No error reported.
updating GRUB 2 menu...
installing the GRUB 2 boot loader on /dev/disk/by-id/ata-CT120BX500SSD1_1943E3D1AC45...
Installing for i386-pc platform.
Installation finished. No error reported.
(51s) winston@silo ~/p/nixos-configs $
systemctl status says the state is “degraded”. Let’s see what broke:
winston@silo ~ $ systemctl list-units --failed
UNIT LOAD ACTIVE SUB DESCRIPTION
● libvirt-guests.service loaded failed failed libvirt guests suspend/resume service
Legend: LOAD → Reflects whether the unit definition was properly loaded.
ACTIVE → The high-level unit activation state, i.e. generalization of SUB.
SUB → The low-level unit activation state, values depend on unit type.
1 loaded units listed.
(34s) 3 winston@silo ~ $
OK let’s check the specific service:
winston@silo ~ $ systemctl status libvirt-guests.service
× libvirt-guests.service - libvirt guests suspend/resume service
Loaded: loaded (/etc/systemd/system/libvirt-guests.service; enabled; preset: enabled)
Drop-In: /nix/store/70x3p9hhrm202n3lfl1p79bv0h2c59zi-system-units/libvirt-guests.service.d
└─overrides.conf
Active: failed (Result: exit-code) since Mon 2024-07-08 18:38:25 CDT; 3min 11s ago
Docs: man:libvirt-guests(8)
https://libvirt.org/
Process: 2340 ExecStart=/nix/store/6a5lmp5p08n9qsfd0l9aqc7jhigm82j9-libvirt-10.0.0/libexec/libvirt-guests.sh start (code=exited, status=1/FAILURE)
Main PID: 2340 (code=exited, status=1/FAILURE)
IP: 0B in, 0B out
CPU: 184ms
Jul 08 18:38:17 silo systemd[1]: Starting libvirt guests suspend/resume service...
Jul 08 18:38:24 silo libvirt-guests.sh[3307]: Resuming guests on default URI...
Jul 08 18:38:25 silo libvirt-guests.sh[3313]: Resuming guest seedbox:
Jul 08 18:38:25 silo libvirt-guests.sh[3318]: error: Failed to start domain 'seedbox'
Jul 08 18:38:25 silo libvirt-guests.sh[3318]: error: operation failed: guest CPU doesn't match specification: extra features: vmx-ins-outs,vmx-true-ctls,vmx-store-lma,vmx-activity-hlt,vmx-vmwrite-vmexit-fields,vmx-apicv-xapic,vmx-ept,vmx-desc-exit,vmx-rdtscp-exit,vmx-apicv-x2apic,vmx-vpid,vmx-wbinvd-exit,vmx-unrestricted-guest,vmx-rdrand-exit,vmx-invpcid-exit,vmx-vmfunc,vmx-shadow-vmcs,vmx-invvpid,vmx-invvpid-single-addr,vmx-invvpid-all-context,vmx-ept-execonly,vmx-page-walk-4,vmx-ept-2mb,vmx-ept-1gb,vmx-invept,vmx-eptad,vmx-invept-single-context,vmx-invept-all-context,vmx-intr-exit,vmx-nmi-exit,vmx-vnmi,vmx-preemption-timer,vmx-vintr-pending,vmx-tsc-offset,vmx-hlt-exit,vmx-invlpg-exit,vmx-mwait-exit,vmx-rdpmc-exit,vmx-rdtsc-exit,vmx-cr3-load-noexit,vmx-cr3-store-noexit,vmx-cr8-load-exit,vmx-cr8-store-exit,vmx-flexpriority,vmx-vnmi-pending,vmx-movdr-exit,vmx-io-exit,vmx-io-bitmap,vmx-mtf,vmx-msr-bitmap,vmx-monitor-exit,vmx-pause-exit,vmx-secondary-ctls,vmx-exit-nosave-debugctl,vmx-exit-load-perf-global-ctrl,vmx-exit-ack-intr,vmx-exit-save-pat,vmx-exit-load-pat,vmx-exit-save-efer,vmx-exit-load-efer,vmx-exit-save-preemption-timer,vmx-entry-noload-debugctl,vmx-entry-ia32e-mode,vmx-entry-load-perf-global-ctrl,vmx-entry-load-pat,vmx-entry-load-efer,vmx-eptp-switching, missing features: vmx-apicv-register,vmx-apicv-vid,vmx-posted-intr
Jul 08 18:38:25 silo systemd[1]: libvirt-guests.service: Main process exited, code=exited, status=1/FAILURE
Jul 08 18:38:25 silo systemd[1]: libvirt-guests.service: Failed with result 'exit-code'.
Jul 08 18:38:25 silo systemd[1]: Failed to start libvirt guests suspend/resume service.
Oh snap, looks like my virtual machine (VM) for BitTorrent is broken. More
specifically, libvirtd failed to resume the VM from saved state (it’s like
folding your laptop shut, but for virtual machines). By the way, everything on
archive.org is available via BitTorrent. Most Linux distros provide .torrent
files too! Hosting torrents for these projects is one low-effort way to help
out the community.
Here’s the error reformatted:
error: Failed to start domain 'seedbox'
error: operation failed: guest CPU doesn't match specification: extra features:
vmx-ins-outs,vmx-true-ctls,vmx-store-lma,vmx-activity-hlt,
vmx-vmwrite-vmexit-fields,vmx-apicv-xapic,vmx-ept,vmx-desc-exit,
vmx-rdtscp-exit,vmx-apicv-x2apic,vmx-vpid,vmx-wbinvd-exit,
vmx-unrestricted-guest,vmx-rdrand-exit,vmx-invpcid-exit,
vmx-vmfunc,vmx-shadow-vmcs,vmx-invvpid,vmx-invvpid-single-addr,
vmx-invvpid-all-context,vmx-ept-execonly,vmx-page-walk-4,vmx-ept-2mb,
vmx-ept-1gb,vmx-invept,vmx-eptad,vmx-invept-single-context,
vmx-invept-all-context,vmx-intr-exit,vmx-nmi-exit,vmx-vnmi,
vmx-preemption-timer,vmx-vintr-pending,vmx-tsc-offset,vmx-hlt-exit,
vmx-invlpg-exit,vmx-mwait-exit,vmx-rdpmc-exit,vmx-rdtsc-exit,
vmx-cr3-load-noexit,vmx-cr3-store-noexit,vmx-cr8-load-exit,
vmx-cr8-store-exit,vmx-flexpriority,vmx-vnmi-pending,vmx-movdr-exit,
vmx-io-exit,vmx-io-bitmap,vmx-mtf,vmx-msr-bitmap,vmx-monitor-exit,
vmx-pause-exit,vmx-secondary-ctls,vmx-exit-nosave-debugctl,
vmx-exit-load-perf-global-ctrl,vmx-exit-ack-intr,vmx-exit-save-pat,
vmx-exit-load-pat,vmx-exit-save-efer,vmx-exit-load-efer,
vmx-exit-save-preemption-timer,vmx-entry-noload-debugctl,
vmx-entry-ia32e-mode,vmx-entry-load-perf-global-ctrl,
vmx-entry-load-pat,vmx-entry-load-efer,vmx-eptp-switching,
missing features: vmx-apicv-register,vmx-apicv-vid,vmx-posted-intr
Whoops, after a quick google, I encountered this regression. If I run virsh start seedbox, I get a similar error related to vmx. The solution is to
either upgrade to 10.2.0 or disable the vmx feature in the guest. vmx is used
for nested virtualization that which my guest does not utilize. Sidenote, vmx
support is detectable via CPUID which was the topic of my last article.
As I don’t use nested virtualization, I opted to disable it. I invoked virsh edit seedbox to spawn a text editor with the libvirtd domain’s (guest’s) XML therein,
then edited <feature policy='require' name='vmx'/> to <feature policy='disable' name='vmx'/>. That <feature> element can be found
nested within the <cpu> element.
I couldn’t help notice had I built this host with Ubuntu, I wouldn’t had
experienced this regression, because the updated QEMU has been shipped already.
In fact, the maintainers marked this bug as Critical.
OK after reading the discourse, it sounds like they didn’t modify the version
string to indicate it has been fixed. If I run nix run nixpkgs#vulnix -- -R $(readlink -f $(which ssh)), I get Found no advisories. Excellent!. This
feels wrong and bad. Not a resounding “you’re safe!” and a spiffy new version
string to back up this claim. No, all I have instead is a random tool claiming
it’s all safe and secure. Trust me bro.
And don’t forget the icing on top of the cake: the nixpkgs team deviated from
standard operating procedures to backport the security fix to 23.11, despite
23.11’s EOL (End of Life) status, and despite their insistence on everyone
upgrade off EOL releases. Had I read one particular forum post, I would have
known that it wasn’t necessary to upgrade!
I’m not out of the woods yet, that pesky Nextcloud upgrade is pending. I’ll be
sure to share if I broke anything.
The fact remains, had I read the discourse very carefully, I wouldn’t had
invested 6 hours today babysitting a NixOS upgrade, since 23.11 received the
same OpenSSH vulnerability fix abject to the team’s standard operating
procedures. I’ll leave it at that. I don’t want to be unduly negative, but
that’s an opportunity cost on everything else one could do with every day’s
sacred time.
The upgrade workflow was cool the first time, but at least one time a year, per
machine? That sounds like a lot of toilsome drudgery. Plus one must face the
inevitable bitrot caused by the behemoth codebase that nixpkgs has ballooned
into, dragged down by a contribution system that permits thousands of tickets
to go unsatisfied and hundreds of unmerged PRs to stagnate within its churn.
NixOS has its uses, and I believe my uses are too pedestrian for NixOS.