Posts tagged "linux":

30 Apr 2020

Linux dmesg --follow (-w) not working?

For a couple months now, I have noticed that running dmesg -w on my workstation does not appear to print out new kernel messages. In other words dmesg --follow "hangs". Additionally when running tail -f /var/log/kern.log to monitor new dmesg messages picked up by sysklogd (part of syslogng), the latest messages do not come through until sysklogd periodically "reopens" the /dev/kmsg kernel message buffer.

Why is this a problem?

This is a problem because I use the dmesg log to monitor important hardware related messages such as the kernel recognizing a USB device or diagnosing bluetooth/wifi issues. When I plug in a USB drive, the first thing I do is check dmesg for the following messages:

[10701.359834] usb 2-4.4: new high-speed USB device number 8 using ehci-pci
[10701.394801] usb 2-4.4: New USB device found, idVendor=12f7, idProduct=0313, bcdDevice= 1.10
[10701.394807] usb 2-4.4: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[10701.394810] usb 2-4.4: Product: MerryGoRound
[10701.394813] usb 2-4.4: Manufacturer: Memorex
[10701.394816] usb 2-4.4: SerialNumber: AAAAAAAAAAAA
[10701.395182] usb-storage 2-4.4:1.0: USB Mass Storage device detected
[10701.398885] scsi host7: usb-storage 2-4.4:1.0
[10702.401161] scsi 7:0:0:0: Direct-Access     Memorex  MerryGoRound     PMAP PQ: 0 ANSI: 0 CCS
[10702.401710] sd 7:0:0:0: Attached scsi generic sg6 type 0
[10702.651720] sd 7:0:0:0: [sde] 15654912 512-byte logical blocks: (8.02 GB/7.46 GiB)
[10702.652341] sd 7:0:0:0: [sde] Write Protect is off
[10702.652346] sd 7:0:0:0: [sde] Mode Sense: 23 00 00 00
[10702.652961] sd 7:0:0:0: [sde] No Caching mode page found
[10702.652965] sd 7:0:0:0: [sde] Assuming drive cache: write through
[10702.681473]  sde: sde1 sde2
[10702.684869] sd 7:0:0:0: [sde] Attached SCSI removable disk

This output reports that a USB device was detected, where it is plugged in, what is the vendor/product information, what USB speed it is using, the size of the storage, the device name (/dev/sde), and its partitions (/dev/sde1, /dev/sde2). There are a lot of other messages written out to dmesg, such as the kernel detecting a bad USB cable, segementation faults, and so on.

Given the importance of the above log output, I have developed a habit of running dmesg -w to monitor such kernel events. The -w tells dmesg to monitor for new messages. The long option is --follow.

In addition to dmesg -w not working as intended, syslogng log entries written to /var/log/kern.log are not written as they occur; instead the log is written in "bursts", which suggests sysklogd occasionally reopens /dev/kmsg, thereby reading in new log messages, but all the timestamps are the same time for each "burst" read.

Which of systems were affected?

I have two systems with a virtually identical OS installation; one is a workstation named snowcrash with an AMD FX-8350 on an ASRock M5A97 R2.0 motherboard; the other is a HP Elitebook 820 G4 named cyberdemon with an Intel Core i5-7300U. Curiously enough, the strange dmesg -w hang does not occur on cyberdemon, but does occur on snowcrash. Both hosts run Linux mainline, with both machines on 5.6.4. Looking through my /var/log/kern.log files, this behavior was apparent on a 5.4.25 kernel. As we will see later, this coincides with the affected versions that others have reported

Additionally, I asked my friend tyil who happens to also use an AMD FX-8350 with Gentoo to check for the bug; he also had the problem on 5.6.0.

Pinpointing the bug

First thing I did was find a way to reproduce the issue. I recorded an asciinema recording. You can watch it here. I then shared the recording on IRC, hoping somebody would know of a solution. I got some helpful and encouraging feedback, but nobody knew of this particular bug. See the recording below, or click here:

The next step was to figure out if there was something wrong with /bin/dmesg. Running strace -o dmesg-strace.log dmesg -w shows the follow pertinent lines:

openat(AT_FDCWD, "/dev/kmsg", O_RDONLY) = 3
lseek(3, 0, SEEK_DATA)                  = 0
read(3, "6,242,717857,-;futex hash table "..., 8191) = 79
fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(0x88, 0x16), ...}) = 0
openat(AT_FDCWD, "/usr/lib64/gconv/gconv-modules.cache", O_RDONLY) = 4
fstat(4, {st_mode=S_IFREG|0644, st_size=26388, ...}) = 0
mmap(NULL, 26388, PROT_READ, MAP_SHARED, 4, 0) = 0x7fed92688000
close(4)                                = 0
futex(0x7fed925f9a14, FUTEX_WAKE_PRIVATE, 2147483647) = 0
write(1, "\33[32m[    0.717857] \33[0m\33[33mfut"..., 97) = 97
… SNIP …
read(3, "6,1853,137347289701,-;input: Mic"..., 8191) = 128
write(1, "\33[32m[137347.289701] \33[0m\33[33min"..., 140) = 140

The last line indicates a pending read() that never completes. Note the 3 file description refers to the /dev/kmsg device. It appears nothing out of the ordinary occurs, except for the fact that the read() simply hangs.

I was a bit at a loss to explain the hanged read(). I was really lost honestly. So I went on, and inspected the changes to /bin/dmesg shipped by util-linux, and did not find any sign of significant changes. I did run dmesg from master just to be sure. See the commit log of dmesg.c here. Additionally I also searched the util-linux bug tracker and found nothing relevant.

Given I had no solution yet, I decided to resort to Googling things, hoping somebody had discussed this bug before. Keywords I tried are:

  • dmesg follow no longer working
  • dmesg kmsg no more messages
  • linux kmsg read hang
  • "/dev/kmsg" hang
  • "dmesg -w" hangs

None of these came up with anything useful. I was using DuckDuckGo mainly, with some Google queries sprinkled on top.

I then visited the torvalds/linux GitHub repository, searched for "kmsg", and did not find a commit that looked like a fix. I picked on through reading commits that /dev/kmsg is written to via the printk functions, so on a whim I decided to look changes made to kernel/printk/printk.c. Reading through the commit logs of printk.c, I realized the last commit is likely the fix:

commit ab6f762f0f53162d41497708b33c9a3236d3609e
Author: Sergey Senozhatsky <protected@email>
Date:   Tue Mar 3 20:30:02 2020 +0900

    printk: queue wake_up_klogd irq_work only if per-CPU areas are ready
    printk_deferred(), similarly to printk_safe/printk_nmi, does not
    immediately attempt to print a new message on the consoles, avoiding
    calls into non-reentrant kernel paths, e.g. scheduler or timekeeping,
    which potentially can deadlock the system.
    Those printk() flavors, instead, rely on per-CPU flush irq_work to print
    messages from safer contexts.  For same reasons (recursive scheduler or
    timekeeping calls) printk() uses per-CPU irq_work in order to wake up
    user space syslog/kmsg readers.
    However, only printk_safe/printk_nmi do make sure that per-CPU areas
    have been initialised and that it's safe to modify per-CPU irq_work.
    This means that, for instance, should printk_deferred() be invoked "too
    early", that is before per-CPU areas are initialised, printk_deferred()
    will perform illegal per-CPU access.
    Lech Perczak [0] reports that after commit 1b710b1b10ef ("char/random:
    silence a lockdep splat with printk()") user-space syslog/kmsg readers
    are not able to read new kernel messages.
    The reason is printk_deferred() being called too early (as was pointed
    out by Petr and John).
    Fix printk_deferred() and do not queue per-CPU irq_work before per-CPU
    areas are initialized.
    Reported-by: Lech Perczak <protected@email>
    Signed-off-by: Sergey Senozhatsky <protected@email>
    Tested-by: Jann Horn <protected@email>
    Reviewed-by: Petr Mladek <protected@email>
    Cc: Greg Kroah-Hartman <protected@email>
    Cc: Theodore Ts'o <protected@email>
    Cc: John Ogness <protected@email>
    Signed-off-by: Linus Torvalds <protected@email>

Unfortunately my understanding of the linux kernel architecture is not comprehensive, let alone competent, the commit message describes

  1. syslog/kmsg readers — which includes dmesg and syslogng,
  2. certain functions don't immediate attempt to print a new message to console,
  3. and syslog/kmsg readers might not wake up.

Indeed it's a bit hard for me to wrap my minimal kernel understanding around, however, reading the linked email list thread clears things up significantly:

After upgrading kernel on our boards from v4.19.105 to v4.19.106 we found out that syslog fails to read the messages after ones read initially after opening /proc/kmsg just after booting. I also found out, that output of 'dmesg –follow' also doesn't react on new printks appearing for whatever reason - to read new messages, reopening /proc/kmsg or /dev/kmsg was needed. I bisected this down to commit 15341b1dd409749fa5625e4b632013b6ba81609b ("char/random: silence a lockdep splat with printk()"), and reverting it on top of v4.19.106 restored correct behaviour.

— Lech Perczak

Now that, sounds like the issue I'm having! The thread also discusses the bug is apparent on 4.19.106 (fixed in 4.19.107 — see this commit), and affects users of 5.5.9, 5.5.15, 5.6.3 (see the PATCHv2 thread).

Applying the patch

Next step is to apply the patch in order to test and verify this fixes the issue.

Since Fall last year, I have used sys-kernel/vanilla-kernel to compile, install, and create an initramfs for my two machines. This is a great ebuild because it uses a kernel .config based off of Archlinux's, so it is compatible with most machines. It is also streamlined in that it does all the work for you — no more manually configuring and remembering which make invocations are necessary to update the kernel. It's not hard to get right, but it's not particularly interesting in my use-case. Additionally, using sys-kernel/vanilla-kernel, the kernel & its modules are now packaged, and can be distributed to my other machine as a binpkg. This streamlines deployment significantly.

In order to add the patch to this ebuild, I simply have to drop the patch file into /etc/portage/patches/sys-kernel/vanilla-kernel. In my case I chose to drop it in /etc/portage/patches/sys-kernel/vanilla-kernel:5.6.4 because I rather the patch only be applied no the current kernel I have installed, than all versions of sys-kernel/vanilla-kernel. This ensures when I upgrade to to the upcoming 5.7 release (which has the fix included), the patch won't be applied and emerge won't fail due to the patch not being applied cleanly.

The commands (commit to my /etc/portage):

mkdir -p /etc/portage/patches/sys-kernel/vanilla-kernel:5.2.6
curl -o /etc/portage/patches/sys-kernel/vanilla-kernel:5.2.6/fix-dmesg--follow.patch \
emerge -1av sys-kernel/vanilla-kernel:5.2.6

An hour later and the kernel is installed. After the reboot, indeed dmesg -w works once again! And the log messages in /var/log/kern.log have timestamps that correctly reflect the kernel time!


Even kernels have regressions. As discussed on IRC, I was reminded that the kernel project is not responsible for the userland, so it's possible such testcases might not be on the radar of most kernel developers. Perhaps it's the distros' responsibilities to execute integrated system testing to catch bugs like this. In any case it is still a surprise to see such a regression occur. We like to think of the kernel as this infallible magical machine that doesn't break except when you do something patently wrong, but this isn't really the case. We're all human.

I want to thank Tyil, Sergey (the patch author), Lech (the bug reporter) and some folks from the #linux IRC channel for helping me pinpoint this issue. The reader may think this is a lot of effort to go through to fix such a simple bug — but it's really important for the kernel to work — if the kernel misbehaves, anything is up for grab. It's not like your bug-laden browser product we accept will have crashing bugs in it — if the kernel crashes or misbehaves, the ramifications are almost as bad as if the hardware is failing — you'll lose your application data, productivity, and trust in the operating system itself.

It is important to mention LTS (Long term support) kernels exist. Given the amount of trouble I went to address this issue, and the fact I rather not have things breaking, I don't think I should be running a mainline kernel at the moment. Perhaps I can install both side by side. then pick 'n choose which kernel to use de jure.

I am very interested to hear of you, the reader's, suggestions for kernel maintenance and version selection strategies. You can find my contact details at . Thank you for reading.

Tags: linux computing
25 Dec 2019

How to fix early framebuffer problems, or "Can I type my disk password yet??"

Most of my workstations & laptops require a passphrase typed in to open the encrypted root filesystem. So my steps to booting are as follows:

  1. Power on machine
  2. Wait for FDE passphrase prompt
  3. Type in FDE passphrase
  4. Wait for boot to complete and automatic XFCE session to start

Since I need to know when the computer is ready to accept the passphrase, it is important the framebuffer is usable during the early part of the boot. In the case of of HP Elitebook 820 G4, the EFI framebuffer does not appear to work, and I rather not boot in BIOS mode to get a functional VESA framebuffer. Making things more awkward, a firmware is needed when the i915 driver is loaded, or the framebuffer will not work either. (It’s not always clear if a firmware is needed, so one should run dmesg | grep -F firmware and check if firmware is being loaded.)

With this information, the problem is summarized to: “How do I ensure i915 is available at boot with the appropriate firmware?”. This question can be easily generalized to any framebuffer driver, as the steps are more-or-less the same.

Zeroth step: Do you need only a driver, or a driver with firmware?

IT is a good idea to verify if your kernel is missing a driver at boot, or is missing firmware or both. Boot up a Live USB with good hardware compatibility, such as GRML1 or Ubuntu’s, and let’s see what framebuffer driver our host is trying to use2:

$ dmesg | grep -i 'frame.*buffer'
[    4.790570] efifb: framebuffer at 0xe0000000, using 8128k, total 8128k
[    4.790611] fb0: EFI VGA frame buffer device
[    4.820637] Console: switching to colour frame buffer device 240x67
[    6.643895] i915 0000:00:02.0: fb1: i915drmfb frame buffer device

Se we can see the efifb is initially used for a couple seconds, then i915 is used for the rest of the computer’s uptime. Now let’s look at if firmware is necessary, first checking if modinfo(8) knows of any firmware:

$ modinfo i915 -F firmware
... SNIP ...

This indicates this driver will load firmware when available, and if necessary for the particular mode of operation or hardware.

Now let’s look at dmesg to see if any firmware is loaded:

[    0.222906] Spectre V2 : Enabling Restricted Speculation for firmware calls
[    5.511731] [drm] Finished loading DMC firmware i915/kbl_dmc_ver1_04.bin (v1.4)
[   25.579703] iwlwifi 0000:02:00.0: loaded firmware version 36.77d01142.0 op_mode iwlmvm
[   25.612759] Bluetooth: hci0: Minimum firmware build 1 week 10 2014
[   25.620251] Bluetooth: hci0: Found device firmware: intel/ibt-12-16.sfi
[   25.712793] iwlwifi 0000:02:00.0: Allocated 0x00400000 bytes for firmware monitor.
[   27.042080] Bluetooth: hci0: Waiting for firmware download to complete

Aha! So it appears we need i915/kbl_dmc_ver1_04.bin for i915. In the case case one doesn’t need firmware, it won’t show anything related to drm or a line with your driver name in it.

By the way, it is a good idea to check dmesg for hints about missing firmware, or alternative drivers, for example my trackpad is supported by both i2c and synaptics based trackpad drivers, and the kernel was kind enough to tell me.

First step: Obtain the firmware

On Gentoo install sys-kernel/linux-firmware. You will have to agree to some non-free licenses; nothing too inane, but worth mentioning. Now just run emerge -av sys-kernel/linux-firmware. (On other distros it might be this easy, or more difficult; for example—in my experience Debian does not ship every single firmware like Gentoo does, so YMMV.)

Second step, Option A: Compile firmware into your kernel

Since most of my systems run Gentoo, it is business as usual to deploy a kernel with most excess drivers disabled except for common hot-swappable components such as USB network interfaces, audio devices, and so on. For example, this laptop’s config was originally derived from genkernel’ stock amd64 config with most extra drivers disabled, then augmented with support for an Acer ES1-111M-C7DE, and finally with support for this Elitebook.

I had compiled the kernel with i915 support built into the image, as opposed to an additional kernel module. Unfortunately this meant the kernel is unable to load firmware from filesystem, because it appears only kernel modules can load firmware from filesystem. To work around this without resorting to making i915 a kernel module, we can include the drivers within the kernel image (vmlinuz). Including firmware and drivers both in the vmlinuz has a couple benefits. First it will always be available. There is no need to figure out how to load the driver and firmware from initrd, let alone getting the initrd generator one is using, to cooperate. A downside is it makes the kernel very specific to the machine, because perhaps a different Intel machine needs a different firmware file compiled in.

To achieve including the firmware in kernel, I set the following values in my kernel config (.config in your kernel source tree).


Note, if you’re using menuconfig, you can type /EXTRA_FIRMWARE (slash for search, then the text) followed by keyboard return to find where these settings exist in the menu system.

Then I verified i915 is indeed not a kernel module, but built into the kernel image (it would be m if it’s a module):


After compiling & installing the kernel (and generating a dracut initrd for cryptsetup/lvm), I was able to reboot and get an early pre-mounted-root framebuffer on this device.

Second step, Option B: A portable kernel approach (using sys-kernel/vanilla-kernel)

I discovered the Gentoo devs have begun shipping an ebuild that builds and installs a kernel with a portable, livecd friendly config. In addition this package will optionally generates an initrd with dracut as a pkgpostinst step, making it very suitable as a replacement for users who just want a working kernel, and don’t mind a excessive compatibility (at a cost to size and build time).

This presents a different challenge, because while this package does allow the user to drop in their own .config, it is not very multiple-machine-deployment friendly to hard-code each individual firmware into the kernel. Instead we tell dracut to include our framebuffer driver. As mentioned above I found this computer uses the i915 kernel driver for framebuffer. Let’s tell dracut to include the driver:

cat > /etc/dracut.conf.d/i915.conf <<EOF
add_drivers+=" i915 "

Dracut is smart enough to pick up the firmware the kernel module needs, provided they are installed. To get an idea what firmware dracut will include, run modinfo i915 -F firmware which will print out a bunch of firmware relative paths.

After applying this fix, just regenerate your initrd using dracut; in my case I let portage do the work: emerge -1av sys-kernel/vanilla-kernel. Finally reboot.


Check dmesg. Always check dmesg. We found two ways to deploy firmware, in-kernel and in-initrd. The in-kernel technique is best for a device-specific kernel, the in-initrd is best for a portable kernel. I am a big fan of the second technique because it scales well to many machines.

I did not touch on the political side of using binary blobs. It would be nice to not use any non-free software, but I rather have a working system with a couple small non-free components, than a non-working system. Which is more valuable, your freedom, or reduced capacity of your tools?



GRML is my favorite live media. It is simple, to the point, has lots of little scripts to streamline tasks such as setting up a wireless AP, a iPXE netboot environment, a router, installing debian, and so on. And Remastering is relatively straight forward. It also has a sane gui sutable for any machine (fluxbox).


Thanks to this post on Ask Ubuntu

Tags: gentoo linux computing
Other posts

© Winston Weinert (winny) — CC-BY-SA-4.0