For a couple months now, I have noticed that running dmesg -w
on my
workstation does not appear to print out new kernel messages. In other
words dmesg --follow
“hangs”. Additionally when running tail -f /var/log/kern.log
to monitor new dmesg messages picked up by
sysklogd
(part of syslogng), the latest messages do not come through
until sysklogd periodically “reopens” the /dev/kmsg
kernel message
buffer.
§Why is this a problem?
This is a problem because I use the dmesg
log to monitor important
hardware related messages such as the kernel recognizing a USB device
or diagnosing bluetooth/wifi issues. When I plug in a USB drive, the
first thing I do is check dmesg for the following messages:
[10701.359834] usb 2-4.4: new high-speed USB device number 8 using ehci-pci
[10701.394801] usb 2-4.4: New USB device found, idVendor=12f7, idProduct=0313, bcdDevice= 1.10
[10701.394807] usb 2-4.4: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[10701.394810] usb 2-4.4: Product: MerryGoRound
[10701.394813] usb 2-4.4: Manufacturer: Memorex
[10701.394816] usb 2-4.4: SerialNumber: AAAAAAAAAAAA
[10701.395182] usb-storage 2-4.4:1.0: USB Mass Storage device detected
[10701.398885] scsi host7: usb-storage 2-4.4:1.0
[10702.401161] scsi 7:0:0:0: Direct-Access Memorex MerryGoRound PMAP PQ: 0 ANSI: 0 CCS
[10702.401710] sd 7:0:0:0: Attached scsi generic sg6 type 0
[10702.651720] sd 7:0:0:0: [sde] 15654912 512-byte logical blocks: (8.02 GB/7.46 GiB)
[10702.652341] sd 7:0:0:0: [sde] Write Protect is off
[10702.652346] sd 7:0:0:0: [sde] Mode Sense: 23 00 00 00
[10702.652961] sd 7:0:0:0: [sde] No Caching mode page found
[10702.652965] sd 7:0:0:0: [sde] Assuming drive cache: write through
[10702.681473] sde: sde1 sde2
[10702.684869] sd 7:0:0:0: [sde] Attached SCSI removable disk
This output reports that a USB device was detected, where it is
plugged in, what is the vendor/product information, what USB speed it
is using, the size of the storage, the device name (/dev/sde
), and
its partitions (/dev/sde1
, /dev/sde2
). There are a lot of other
messages written out to dmesg
, such as the kernel detecting a bad
USB cable, segementation faults, and so on.
Given the importance of the above log output, I have developed a habit
of running dmesg -w
to monitor such kernel events. The -w
tells
dmesg to monitor for new messages. The long option is --follow
.
In addition to dmesg -w
not working as intended, syslogng log
entries written to /var/log/kern.log
are not written as they occur;
instead the log is written in “bursts”, which suggests sysklogd
occasionally reopens /dev/kmsg
, thereby reading in new log messages,
but all the timestamps are the same time for each “burst” read.
§Which of systems were affected?
I have two systems with a virtually identical OS installation; one is a
workstation named snowcrash
with an AMD FX-8350 on an ASRock M5A97
R2.0 motherboard; the other is a HP Elitebook 820 G4 named
cyberdemon
with an Intel Core i5-7300U. Curiously enough, the
strange dmesg -w
hang does not occur on cyberdemon
, but does occur
on snowcrash
. Both hosts run Linux mainline, with both machines on
5.6.4. Looking through my /var/log/kern.log
files, this behavior was
apparent on a 5.4.25 kernel. As we will see later, this coincides with
the affected versions that others have reported
Additionally, I asked my friend tyil who happens to also use an AMD FX-8350 with Gentoo to check for the bug; he also had the problem on 5.6.0.
§Pinpointing the bug
First thing I did was find a way to reproduce the issue. I recorded an asciinema recording. You can watch it here. I then shared the recording on IRC, hoping somebody would know of a solution. I got some helpful and encouraging feedback, but nobody knew of this particular bug. See the recording below, or click here:
The next step was to figure out if there was something wrong with
/bin/dmesg
. Running strace -o dmesg-strace.log dmesg -w
shows the
follow pertinent lines:
openat(AT_FDCWD, "/dev/kmsg", O_RDONLY) = 3
lseek(3, 0, SEEK_DATA) = 0
read(3, "6,242,717857,-;futex hash table "..., 8191) = 79
fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(0x88, 0x16), ...}) = 0
openat(AT_FDCWD, "/usr/lib64/gconv/gconv-modules.cache", O_RDONLY) = 4
fstat(4, {st_mode=S_IFREG|0644, st_size=26388, ...}) = 0
mmap(NULL, 26388, PROT_READ, MAP_SHARED, 4, 0) = 0x7fed92688000
close(4) = 0
futex(0x7fed925f9a14, FUTEX_WAKE_PRIVATE, 2147483647) = 0
write(1, "\33[32m[ 0.717857] \33[0m\33[33mfut"..., 97) = 97
… SNIP …
read(3, "6,1853,137347289701,-;input: Mic"..., 8191) = 128
write(1, "\33[32m[137347.289701] \33[0m\33[33min"..., 140) = 140
read(3,
The last line indicates a pending read()
that never completes. Note
the 3
file description refers to the /dev/kmsg
device. It appears
nothing out of the ordinary occurs, except for the fact that the
read()
simply hangs.
I was a bit at a loss to explain the hanged read()
. I was really
lost honestly. So I went on, and inspected the changes to /bin/dmesg
shipped by util-linux, and did not find any sign of significant
changes. I did run dmesg
from master
just to be sure. See the
commit log of dmesg.c
here. Additionally I also searched the
util-linux bug tracker and found nothing relevant.
Given I had no solution yet, I decided to resort to Googling things, hoping somebody had discussed this bug before. Keywords I tried are:
- dmesg follow no longer working
- dmesg kmsg no more messages
- linux kmsg read hang
- “/dev/kmsg” hang
- “dmesg -w” hangs
None of these came up with anything useful. I was using DuckDuckGo mainly, with some Google queries sprinkled on top.
I then visited the torvalds/linux GitHub repository, searched for
“kmsg”, and did not find a commit that looked like a fix. I picked on
through reading commits that /dev/kmsg
is written to via the printk
functions, so on a whim I decided to look changes made to
kernel/printk/printk.c
. Reading through the commit logs of printk.c
,
I realized the last commit is likely the fix:
commit ab6f762f0f53162d41497708b33c9a3236d3609e
Author: Sergey Senozhatsky <protected@email>
Date: Tue Mar 3 20:30:02 2020 +0900
printk: queue wake_up_klogd irq_work only if per-CPU areas are ready
printk_deferred(), similarly to printk_safe/printk_nmi, does not
immediately attempt to print a new message on the consoles, avoiding
calls into non-reentrant kernel paths, e.g. scheduler or timekeeping,
which potentially can deadlock the system.
Those printk() flavors, instead, rely on per-CPU flush irq_work to print
messages from safer contexts. For same reasons (recursive scheduler or
timekeeping calls) printk() uses per-CPU irq_work in order to wake up
user space syslog/kmsg readers.
However, only printk_safe/printk_nmi do make sure that per-CPU areas
have been initialised and that it's safe to modify per-CPU irq_work.
This means that, for instance, should printk_deferred() be invoked "too
early", that is before per-CPU areas are initialised, printk_deferred()
will perform illegal per-CPU access.
Lech Perczak [0] reports that after commit 1b710b1b10ef ("char/random:
silence a lockdep splat with printk()") user-space syslog/kmsg readers
are not able to read new kernel messages.
The reason is printk_deferred() being called too early (as was pointed
out by Petr and John).
Fix printk_deferred() and do not queue per-CPU irq_work before per-CPU
areas are initialized.
Link: https://lore.kernel.org/lkml/aa0732c6-5c4e-8a8b-a1c1-75ebe3dca05b@camlintechnologies.com/
Reported-by: Lech Perczak <protected@email>
Signed-off-by: Sergey Senozhatsky <protected@email>
Tested-by: Jann Horn <protected@email>
Reviewed-by: Petr Mladek <protected@email>
Cc: Greg Kroah-Hartman <protected@email>
Cc: Theodore Ts'o <protected@email>
Cc: John Ogness <protected@email>
Signed-off-by: Linus Torvalds <protected@email>
Unfortunately my understanding of the linux kernel architecture is not comprehensive, let alone competent, the commit message describes
- syslog/kmsg readers — which includes dmesg and syslogng,
- certain functions don’t immediate attempt to print a new message to console,
- and syslog/kmsg readers might not wake up.
Indeed it’s a bit hard for me to wrap my minimal kernel understanding around, however, reading the linked email list thread clears things up significantly:
After upgrading kernel on our boards from v4.19.105 to v4.19.106 we found out that syslog fails to read the messages after ones read initially after opening /proc/kmsg just after booting. I also found out, that output of ‘dmesg –follow’ also doesn’t react on new printks appearing for whatever reason - to read new messages, reopening /proc/kmsg or /dev/kmsg was needed. I bisected this down to commit 15341b1dd409749fa5625e4b632013b6ba81609b (“char/random: silence a lockdep splat with printk()”), and reverting it on top of v4.19.106 restored correct behaviour.
— Lech Perczak
Now that, sounds like the issue I’m having! The thread also discusses the bug is apparent on 4.19.106 (fixed in 4.19.107 — see this commit), and affects users of 5.5.9, 5.5.15, 5.6.3 (see the PATCHv2 thread).
§Further reading related to the above commit
- The commit that broke things
- The commit that fixed things
- Regression in v4.19.106 breaking waking up of readers of /proc/kmsg and /dev/kmsg
- {PATCH} printk: queue
wake_up_klogd
irq_work
only if per-CPU areas are ready - {PATCHv2} printk: queue
wake_up_klogd
irq_work
only if per-CPU areas are ready - linux-4.19.y: Revert “char/random: silence a lockdep splat with printk()”
§Applying the patch
Next step is to apply the patch in order to test and verify this fixes the issue.
Since Fall last year, I have used sys-kernel/vanilla-kernel
to
compile, install, and create an initramfs for my two machines. This is
a great ebuild because it uses a kernel .config
based off of
Archlinux’s, so it is compatible with most machines. It is also
streamlined in that it does all the work for you — no more manually
configuring and remembering which make invocations are necessary to
update the kernel. It’s not hard to get right, but it’s not
particularly interesting in my use-case. Additionally, using
sys-kernel/vanilla-kernel
, the kernel & its modules are now
packaged, and can be distributed to my other machine as a binpkg. This
streamlines deployment significantly.
In order to add the patch to this ebuild, I simply have to drop the
patch file into /etc/portage/patches/sys-kernel/vanilla-kernel
. In
my case I chose to drop it in
/etc/portage/patches/sys-kernel/vanilla-kernel:5.6.4
because I
rather the patch only be applied no the current kernel I have
installed, than all versions of sys-kernel/vanilla-kernel
. This
ensures when I upgrade to to the upcoming 5.7 release (which has the
fix included), the patch won’t be applied and emerge won’t fail due to
the patch not being applied cleanly.
The commands (commit to my /etc/portage
):
mkdir -p /etc/portage/patches/sys-kernel/vanilla-kernel:5.2.6
curl -o /etc/portage/patches/sys-kernel/vanilla-kernel:5.2.6/fix-dmesg--follow.patch \
https://github.com/torvalds/linux/commit/ab6f762f0f53162d41497708b33c9a3236d3609e.patch
emerge -1av sys-kernel/vanilla-kernel:5.2.6
An hour later and the kernel is installed. After the reboot, indeed
dmesg -w
works once again! And the log messages in
/var/log/kern.log
have timestamps that correctly reflect the kernel
time!
§Conclusion
Even kernels have regressions. As discussed on IRC, I was reminded that the kernel project is not responsible for the userland, so it’s possible such testcases might not be on the radar of most kernel developers. Perhaps it’s the distros’ responsibilities to execute integrated system testing to catch bugs like this. In any case it is still a surprise to see such a regression occur. We like to think of the kernel as this infallible magical machine that doesn’t break except when you do something patently wrong, but this isn’t really the case. We’re all human.
I want to thank Tyil, Sergey (the patch author), Lech (the bug reporter) and some folks from the #linux IRC channel for helping me pinpoint this issue. The reader may think this is a lot of effort to go through to fix such a simple bug — but it’s really important for the kernel to work — if the kernel misbehaves, anything is up for grab. It’s not like your bug-laden browser product we accept will have crashing bugs in it — if the kernel crashes or misbehaves, the ramifications are almost as bad as if the hardware is failing — you’ll lose your application data, productivity, and trust in the operating system itself.
It is important to mention LTS (Long term support) kernels exist. Given the amount of trouble I went to address this issue, and the fact I rather not have things breaking, I don’t think I should be running a mainline kernel at the moment. Perhaps I can install both side by side. then pick ’n choose which kernel to use de jure.
I am very interested to hear of you, the reader’s, suggestions for kernel maintenance and version selection strategies. You can find my contact details at https://winny.tech/ . Thank you for reading.