Posts tagged "linux":

31 Aug 2020

When NAT Bites — Use a Reverse VPN

Sometimes I find myself setting up servers on networks with less than ideal network configuration. Most home internets use dynamic IP addresses, which requires extra work to ensure I know the IP address to use when logging into the network from the internet. Another concern is how unreliable home networking gear can be, especially with users tweaking settings without fully appreciating what they're doing. As a result, I've devised an alternate solution to ensure I can always log into boxes hosted on home internet connections.

How to ssh into a home server box — the standard way

Let's say I am to set up a box on a home network. I want to ensure I can log in via SSH from the outside world — that is anywhere online. In IPv6 and IPv4 land it is equally painful and possibly unreliable.

Allowing ssh logins oven IPv6

After setting up the box on the network, I will need to configure the router's firewall to allow SSH traffic on through. In the case of IPv6 this is as simple as telling the router to allow traffic inbound on port 22 to my box's IPv6 address, which is globally routable and unique to my box. One more thing: the ISP gives each IPv6-enabled subscriber a IPv6 prefix, which is a bunch of IPv6 addresses that the customer's equipment utilize via SLAAC. Let's work through an example:

  1. The ISP gives your router IPv6 prefix 2001:db8:a:b::/641 via DHCPv6
  2. Every device on the network assigns itself an IPv6 address via SLAAC
  3. Each device on the network is globally routed via a unique IPv6 address. For example the server box is 2001:db8:a:b::1, the router is 2001:db8:a:b::5:1, and my phone is 2001:db8:a:b::c:d.
  4. I want to log into the server box. I ssh in to the server box's address directly.
  5. The router permits SSH traffic to my server box through its firewall
  6. Success! I am logged in.

It seems pretty straight forward, But there is a catch: when your DHCPv6 lease expires (e.g. 1 week), there is a possibility your network is assigned a new IPv6 prefix. Hence the IPv6 address can be thought as dynamic.

In order to get around this limitation, one can use Dynamic DNS and update a AAAA DNS record for every host on the LAN that needs to be accessed via the internet. This means you'll either have to run a Dynamic DNS client on each host, or use some sort of host discovery to set Dynamic DNS AAAA records on another host's behalf. Seems a bit confusing; in short when using IPv6 on a home network, special care is needed to ensure SSH is accessible.

Allowing ssh logins over IPv4

Looking at IPv4, some things differ. As in IPv6's case, one needs to allow SSH traffic to the target box on the home network. What differs is one does not directly address the internal host; instead one has to address the router (which is usually plugged directly into the modem/internet). The router then passes this SSH traffic to the box on the home network. In effect the router is translating between different network segments. The feature of your router passing traffic on a port to another host on the home network is known as port forwarding.

Now that we understand the main difference, we should also discuss how IPv4 addresses are given to home ISP users. Typically the mechanism of how your home router gets an IPv4 address is the same as how your personal devices get addresses from the router. In both cases the device requesting an address uses DHCP to solicit a IPv4 address and related configuration (such as DNS servers and default gateway). Again the same process is applied to each device on the home network. They simply get IPv4 addresses from the router, but in this case, the router does not give out globally routable addresses, but instead gives out private IPv4 addresses. These private IPv4 addresses are not globally routable, and instead depend on the router to figure out where to send internet stuff. Also, as with DHCPv6, there is a finite lifespan for DHCP leases. As such after awhile, your router's IPv4 address may change.

Let's work through a IPv4 example:

  1. Your router solicits a IPv4 address from the ISP, such as 203.0.113.1.2 This is the only internet-routable address used in the setup.
  2. Each device on the network solicits an IPv4 address from the router.3 Each device has its own private IPv4 address.4 In this example the server box has 192.168.1.20, the router has 192.168.1.1, my phone has 192.168.1.101.
  3. Internet traffic is directed towards the router (acting as the default gateway), which in turn translates the network traffic. This is done by replacing the private IPv4 address and source port with the public IPv4 address and a "session port", which is used to uniquely map between traffic belonging on the internet side to traffic belonging on the home network side. This is network address translation.
  4. When I want to log into the server box, I ssh into the router's public IPv4 address.
  5. The router translates the ssh traffic to a the private IPv4 address space and sends it off to the server box. The server box then handles the ssh session sent from the router, which then the router relays back and forth between the server box on the private LAN with my internet computer.

Seems a bit convoluted too. But it does work. When combined with Dynamic DNS to update an A DNS record for the router's public IPv4 address, it can work pretty flawlessly.

What can go wrong?

When using any of the above approaches to set up SSH login, a few things can go wrong.

  1. In both IPv4 and IPv6, the router depends on the ISP to assign internet addresses. As such in both cases, Dynamic DNS or a similar mechanism is needed to ensure I can log in via a DNS record, or at least know what address works when using SSH. If I don't know the latest internet address, I cannot log in.
  2. The router has to be specially configured to allow traffic in both IPv4 and IPv6 situations. Using IPv6, the router must permit traffic to the server box. Using IPv4, the router has modify network traffic it receives on the SSH port and pass it on to the server box. And then it does the same to send traffic back to my internet device. If the router configuration breaks for whatever reason, I cannot log in.
  3. Some internet connections cannot receive SSH traffic. Most mobile internet service providers (such as ATE) use Carrier Grade NAT to allow multiple internet subscribers to connect to the internet using the same public IPv4 address. Port forwarding won't work with Carrier Grade NAT. Additionally some ISPs do nefarious things like blocking ports for you. After all it's pretty normal for ISPs to block SMTP (even if I think it's bonkers).

In short the network operator must cooperate with you to make this happen. In some cases the hardware isn't capable either.

A few options

Before continuing I'd like to point out—like most technology—the exact same tools every single user uses daily (in one form or fashion) can be used to attack the same systems. As such the next example is for demonstration purposes. Data is sent plain text and no authentication/verification of host or user is performed. Computers networks are only as secure as their users ;).

Punt the socket using socat

One can use socat to (1) run a socat client that connects to a internet server, which sends all data to the home server box's SSH port and (2) run a socat server on the internet host which sends traffic it receives to the socat client connected on another port on the internet server.

Let's try this out: given host public.example.com and your LAN host not-reachable.lan, run socat TCP-LISTEN:9991,fork,forever TCP-LISTEN:9992 on public.example.com, then run socat TCP-CONNECT:public.example.com:9991,fork,forever TCP-CONNECT:localhost:22 on not-reachable.lan. Now run ssh -p 9992 public.example.com. You should see the familiar messages of OpenSSH.

Sounds a bit confusing, because it is. Additionally there is not an easy mechanism to ensure multiple ssh sessions can be used simultaneously, and to ensure the authenticity of the socat client that connects to the internet server. So I wouldn't use this in production ;). Note: socat can use OpenSSL, which can address the authenticity problem, but I still don't think this is a very intuitive way to solve the problem.

20200901_socat_diagram.png
Figure 1: socat topology. Arrows point away from the connecting socket to the destination process.

Better: Using OpenSSH

During a recent episode of Linux Unplugged, there was discussion of using OpenSSH to dial in to a internet-reachable host. The secret sauce is to use the RemoteForward option (ssh -R). This can be achieved like ssh -R 2222:localhost:22 my-server.example. Then from the server, one can run ssh -p 2222 localhost to log into the firewalled host's SSH server. Alternately, OpenSSH also supports forwarding a SOCKS5 proxy, which can be used in conjunction with a web browser to browse web configuration UIs with little effort, or with use other applications via the tool proxychains.

20200901_ssh_diagram.png
Figure 2: ssh reverse port forwarding topology.

My Solution: Use OpenVPN

I wanted to allow for the firewalled host be accessible as an unique network address, without need for SOCKS5 or other steps. The solution I came up with is to stand up a OpenVPN server process on the internet host, then run OpenVPN client on the remote host. The configuration is deceptively simple, despite OpenVPN's featureful footprint.

Some OpenVPN operators allow users to log in via username or password associated with their account, but this appears a bit complicated to set up. Instead, the OpenVPN folks recommend setting up a Self-signed certificate authority to dole out TLS certificates. This is achieved using the very handy script EasyRSA, which streamlines the process of creating a Certificate Authority and issuing keypairs into a handful of very short commands.

20200901_openvpn_diagram.png
Figure 3: OpenVPN topology.

After setting up OpenVPN like I will outline below, one can simply run ssh 10.100.0.10, if the home server's VPN "virtual" IP is 10.100.0.0. Other services hosted on the home server are also accessible by that IP.

Steps to set up (based off my playbooks)

The following steps are based off of the official OpenVPN tutorial for setting up multiple clients with their own certificates. This means a compromise of one client's private key will not compromise the integrity of other client private keys.

Set up the certificate authority & certificates

# Download EasyRSA
VERSION=3.0.7
curl -O easy-rsa-${VERSION}.tar.gz \
    https://github.com/OpenVPN/easy-rsa/archive/v${VERSION}.tar.gz
tar -xzvf easy-rsa-${VERSION}.tar.gz
cd easy-rsa-${VERSION}/easyrsa3

# Get script usage (it does not understand --help)
./easyrsa help

# One-time CA and DH params initialization
./easyrsa init-pki
echo 'My Cool CA Name' | ./easyrsa build-ca nopass
./easyrsa gen-dh

# Do this for each OpenVPN Server.  Each server name
# (e.g. "my_server") must be unique.
./easyrsa build-server-full my_server nopass

# Do this for each OpenVPN Client.  Each client name
# (e.g. "my_client") must be unique.
./easyrsa build-client-full my_client nopass

The generated files used later in the install are:

pki/ca.crt
The certificate authority public certificate. Install alongside any other another *.crt file. (Or maybe chain them?)
pki/private/ca.key
The certificate authority private certificate.
pki/dh.pem
Diffie Hellman parameters, TODO what is this for? Install on the server.
pki/issued/my_server.crt
The server public certificate. Install on the server.
pki/private/my_server.key
The server private certificate. Install on the server.
pki/issued/my_client.crt
The client public certificate. Install on the client.
pki/private/my_client.key
The client private certificate. Install on the client.

I currently do not have the CA management scripted by Ansible. I am a little uncomfortable with the idea of Ansible entirely managing the CA creation followed by certification creation. More experience with Ansible should help put my concerns at ease. Chiefly, I don't want Ansible to write out certificates from the "master CA tree". Sometimes a little manual operation is pgood.

Set up the OpenVPN Server (Internet-side)

See the corresponding Ansible role's tasks.

The server's openvpn.conf should look like this:

tls-server

port 12345

proto udp

dev tun0

ca /etc/openvpn/secrets/ca.crt
cert /etc/openvpn/secrets/{{inventory_hostname}}.crt
key /etc/openvpn/secrets/{{inventory_hostname}}.key
dh /etc/openvpn/secrets/dh.pem

server 10.100.0.0 255.255.255.0

persist-key
persist-tun

ifconfig-pool-persist ipp.txt
push "route 10.100.0.0 255.255.255.0"

keepalive 10 120
comp-lzo

user openvpn
group openvpn

status openvpn-status.log
log /var/log/openvpn/openvpn.log
verb 4

This is probably the easiest part.

  1. Install OpenVPN. apk add openvpn
  2. Install the CA keypair.5
  3. Install the VPN server's keypair.
  4. Install the Diffie Hellman parameters file.
  5. Install the server's openvpn.conf
  6. Run the server openvpn --config /etc/openvpn/openvpn.conf

Set up the OpenVPN Client (Your Firewalled Host)

See the corresponding Ansible role's tasks.

The client's openvpn.conf should look like this:

client

proto udp

dev tun0

remote public.example.com 12345

nobind
resolv-retry 30

script-security 2

ca /etc/openvpn/secrets/ca.crt
cert /etc/openvpn/secrets/{{inventory_hostname}}.crt
key /etc/openvpn/secrets/{{inventory_hostname}}.key

persist-key
persist-tun

keepalive 10 120
comp-lzo

log /var/log/openvpn/openvpn.log
verb 4
  1. Install OpenVPN. apk add openvpn
  2. Install the CA public key.
  3. Install the VPN Client's keypair.
  4. Install the client's openvpn.conf
  5. Run the client using openvpn --config /etc/openvpn/openvpn.conf

Set up another OpenVPN client on your PC

Follow the same instructions as setting up your firewalled host, but be sure to generate a unique SSL keypair to identify your PC. If the OpenVPN server, the firewalled host OpenVPN client, and your PC's OpenVPN client are all set up correctly, you should be able to directly connect to any of the hosts participating in the VPN via the VPN's private IPv4 network.

Wishlist

This setup works swimmingly, but there are a few nits in the amount of effort involved in discovering hosts connected to the VPN. I also realized instead of only routing directly to other VPN clients, one could also join the various LANs, so I gave that some thought as well.

VPN Host Discovery

At present I scan for remote connected hosts via:

winston@snowcrash ~ $ sudo nmap -sP -PE 10.100.0.0/24
Password: 
Starting Nmap 7.80 ( https://nmap.org ) at 2020-08-15 01:31 CDT
Nmap scan report for 10.100.0.1
Host is up (0.026s latency).
Nmap scan report for 10.100.0.6
Host is up (0.053s latency).
Nmap scan report for 10.100.0.14
Host is up (0.064s latency).
Nmap scan report for 10.100.0.10
Host is up.
Nmap done: 256 IP addresses (4 hosts up) scanned in 2.61 seconds

In the above output, 10.100.0.1 is the OpenVPN server address, and according to ip addr show dev tun0, my current OpenVPN client is 10.100.0.10.

Pretty icky. Alternatively, one can take a look at the server's openvpn.log:

Fri Aug 14 02:02:48 2020 us=357198 cyberdemon/127.0.0.1:42995 MULTI: primary virtual IP for cyberdemon/127.0.0.1:42995: 10.100.0.18

In both cases it's a bit tedious to figure out which IP belongs to which host participating on the VPN.

DNS-SD (via Avahi) might be suitable for this. Instead of scanning for available hosts, one can simply query for a DNS-SD (DNS Service Discovery) type, quite possibly either _http._tcp or _ssh._tcp. Here is one such guide.

Domain Names for VPN Hosts

An related wish (which overlaps in some ways) is to configure OpenVPN to register VPN clients' information including their assigned IP address with a DNS server such as dnsmasq or tinydns or even busybox's dns server. I did some skimming about this, and came up with the conclusion that it's possible, but is not well understood or documented, so I decided to go with host discovery via network scanning for the time being.

Configure the VPN Clients' LAN segments be routable from other Clients

This is a killer feature I have yet to figure out. It appears possible, but this requires a bit of further testing, and is relatively tricky to set up correctly, since if a VPN Client ends up using the same IP network in two different contexts, things will silently stop working without explanation, because that's as intended, IP is funny like that.

Footnotes:

1

2001:db8:a:b::/64 is part of 2001:db8::/32 which is reserved for documentation in RFC 3849.

2

203.0.113.1 is part of 203.0.113.0/24 (TEST-NET-3) which is reserved for documentation in RFC 5737.

3

Devices could be configured on the IPv4 LAN with so-called static IPv4 addresses, but usually you don't want this. You can achieve the same by assigning hosts the same IPv4 every time (which are identified the network interface's MAC address or even the optional hostname field).

4

In this example I use 192.168.1.0/24 which is reserved for private network use in RFC 1918.

5

The CA private key is likely not necessary and probably a serious smell, if you know anything about this, please write me explaining this.

Tags: vpn networking linux devops
30 Apr 2020

Linux dmesg --follow (-w) not working?

For a couple months now, I have noticed that running dmesg -w on my workstation does not appear to print out new kernel messages. In other words dmesg --follow "hangs". Additionally when running tail -f /var/log/kern.log to monitor new dmesg messages picked up by sysklogd (part of syslogng), the latest messages do not come through until sysklogd periodically "reopens" the /dev/kmsg kernel message buffer.

Why is this a problem?

This is a problem because I use the dmesg log to monitor important hardware related messages such as the kernel recognizing a USB device or diagnosing bluetooth/wifi issues. When I plug in a USB drive, the first thing I do is check dmesg for the following messages:

[10701.359834] usb 2-4.4: new high-speed USB device number 8 using ehci-pci
[10701.394801] usb 2-4.4: New USB device found, idVendor=12f7, idProduct=0313, bcdDevice= 1.10
[10701.394807] usb 2-4.4: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[10701.394810] usb 2-4.4: Product: MerryGoRound
[10701.394813] usb 2-4.4: Manufacturer: Memorex
[10701.394816] usb 2-4.4: SerialNumber: AAAAAAAAAAAA
[10701.395182] usb-storage 2-4.4:1.0: USB Mass Storage device detected
[10701.398885] scsi host7: usb-storage 2-4.4:1.0
[10702.401161] scsi 7:0:0:0: Direct-Access     Memorex  MerryGoRound     PMAP PQ: 0 ANSI: 0 CCS
[10702.401710] sd 7:0:0:0: Attached scsi generic sg6 type 0
[10702.651720] sd 7:0:0:0: [sde] 15654912 512-byte logical blocks: (8.02 GB/7.46 GiB)
[10702.652341] sd 7:0:0:0: [sde] Write Protect is off
[10702.652346] sd 7:0:0:0: [sde] Mode Sense: 23 00 00 00
[10702.652961] sd 7:0:0:0: [sde] No Caching mode page found
[10702.652965] sd 7:0:0:0: [sde] Assuming drive cache: write through
[10702.681473]  sde: sde1 sde2
[10702.684869] sd 7:0:0:0: [sde] Attached SCSI removable disk

This output reports that a USB device was detected, where it is plugged in, what is the vendor/product information, what USB speed it is using, the size of the storage, the device name (/dev/sde), and its partitions (/dev/sde1, /dev/sde2). There are a lot of other messages written out to dmesg, such as the kernel detecting a bad USB cable, segementation faults, and so on.

Given the importance of the above log output, I have developed a habit of running dmesg -w to monitor such kernel events. The -w tells dmesg to monitor for new messages. The long option is --follow.

In addition to dmesg -w not working as intended, syslogng log entries written to /var/log/kern.log are not written as they occur; instead the log is written in "bursts", which suggests sysklogd occasionally reopens /dev/kmsg, thereby reading in new log messages, but all the timestamps are the same time for each "burst" read.

Which of systems were affected?

I have two systems with a virtually identical OS installation; one is a workstation named snowcrash with an AMD FX-8350 on an ASRock M5A97 R2.0 motherboard; the other is a HP Elitebook 820 G4 named cyberdemon with an Intel Core i5-7300U. Curiously enough, the strange dmesg -w hang does not occur on cyberdemon, but does occur on snowcrash. Both hosts run Linux mainline, with both machines on 5.6.4. Looking through my /var/log/kern.log files, this behavior was apparent on a 5.4.25 kernel. As we will see later, this coincides with the affected versions that others have reported

Additionally, I asked my friend tyil who happens to also use an AMD FX-8350 with Gentoo to check for the bug; he also had the problem on 5.6.0.

Pinpointing the bug

First thing I did was find a way to reproduce the issue. I recorded an asciinema recording. You can watch it here. I then shared the recording on IRC, hoping somebody would know of a solution. I got some helpful and encouraging feedback, but nobody knew of this particular bug. See the recording below, or click here:

The next step was to figure out if there was something wrong with /bin/dmesg. Running strace -o dmesg-strace.log dmesg -w shows the follow pertinent lines:

openat(AT_FDCWD, "/dev/kmsg", O_RDONLY) = 3
lseek(3, 0, SEEK_DATA)                  = 0
read(3, "6,242,717857,-;futex hash table "..., 8191) = 79
fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(0x88, 0x16), ...}) = 0
openat(AT_FDCWD, "/usr/lib64/gconv/gconv-modules.cache", O_RDONLY) = 4
fstat(4, {st_mode=S_IFREG|0644, st_size=26388, ...}) = 0
mmap(NULL, 26388, PROT_READ, MAP_SHARED, 4, 0) = 0x7fed92688000
close(4)                                = 0
futex(0x7fed925f9a14, FUTEX_WAKE_PRIVATE, 2147483647) = 0
write(1, "\33[32m[    0.717857] \33[0m\33[33mfut"..., 97) = 97
… SNIP …
read(3, "6,1853,137347289701,-;input: Mic"..., 8191) = 128
write(1, "\33[32m[137347.289701] \33[0m\33[33min"..., 140) = 140
read(3,

The last line indicates a pending read() that never completes. Note the 3 file description refers to the /dev/kmsg device. It appears nothing out of the ordinary occurs, except for the fact that the read() simply hangs.

I was a bit at a loss to explain the hanged read(). I was really lost honestly. So I went on, and inspected the changes to /bin/dmesg shipped by util-linux, and did not find any sign of significant changes. I did run dmesg from master just to be sure. See the commit log of dmesg.c here. Additionally I also searched the util-linux bug tracker and found nothing relevant.

Given I had no solution yet, I decided to resort to Googling things, hoping somebody had discussed this bug before. Keywords I tried are:

  • dmesg follow no longer working
  • dmesg kmsg no more messages
  • linux kmsg read hang
  • "/dev/kmsg" hang
  • "dmesg -w" hangs

None of these came up with anything useful. I was using DuckDuckGo mainly, with some Google queries sprinkled on top.

I then visited the torvalds/linux GitHub repository, searched for "kmsg", and did not find a commit that looked like a fix. I picked on through reading commits that /dev/kmsg is written to via the printk functions, so on a whim I decided to look changes made to kernel/printk/printk.c. Reading through the commit logs of printk.c, I realized the last commit is likely the fix:

commit ab6f762f0f53162d41497708b33c9a3236d3609e
Author: Sergey Senozhatsky <protected@email>
Date:   Tue Mar 3 20:30:02 2020 +0900

    printk: queue wake_up_klogd irq_work only if per-CPU areas are ready
    
    printk_deferred(), similarly to printk_safe/printk_nmi, does not
    immediately attempt to print a new message on the consoles, avoiding
    calls into non-reentrant kernel paths, e.g. scheduler or timekeeping,
    which potentially can deadlock the system.
    
    Those printk() flavors, instead, rely on per-CPU flush irq_work to print
    messages from safer contexts.  For same reasons (recursive scheduler or
    timekeeping calls) printk() uses per-CPU irq_work in order to wake up
    user space syslog/kmsg readers.
    
    However, only printk_safe/printk_nmi do make sure that per-CPU areas
    have been initialised and that it's safe to modify per-CPU irq_work.
    This means that, for instance, should printk_deferred() be invoked "too
    early", that is before per-CPU areas are initialised, printk_deferred()
    will perform illegal per-CPU access.
    
    Lech Perczak [0] reports that after commit 1b710b1b10ef ("char/random:
    silence a lockdep splat with printk()") user-space syslog/kmsg readers
    are not able to read new kernel messages.
    
    The reason is printk_deferred() being called too early (as was pointed
    out by Petr and John).
    
    Fix printk_deferred() and do not queue per-CPU irq_work before per-CPU
    areas are initialized.
    
    Link: https://lore.kernel.org/lkml/aa0732c6-5c4e-8a8b-a1c1-75ebe3dca05b@camlintechnologies.com/
    Reported-by: Lech Perczak <protected@email>
    Signed-off-by: Sergey Senozhatsky <protected@email>
    Tested-by: Jann Horn <protected@email>
    Reviewed-by: Petr Mladek <protected@email>
    Cc: Greg Kroah-Hartman <protected@email>
    Cc: Theodore Ts'o <protected@email>
    Cc: John Ogness <protected@email>
    Signed-off-by: Linus Torvalds <protected@email>

Unfortunately my understanding of the linux kernel architecture is not comprehensive, let alone competent, the commit message describes

  1. syslog/kmsg readers — which includes dmesg and syslogng,
  2. certain functions don't immediate attempt to print a new message to console,
  3. and syslog/kmsg readers might not wake up.

Indeed it's a bit hard for me to wrap my minimal kernel understanding around, however, reading the linked email list thread clears things up significantly:

After upgrading kernel on our boards from v4.19.105 to v4.19.106 we found out that syslog fails to read the messages after ones read initially after opening /proc/kmsg just after booting. I also found out, that output of 'dmesg –follow' also doesn't react on new printks appearing for whatever reason - to read new messages, reopening /proc/kmsg or /dev/kmsg was needed. I bisected this down to commit 15341b1dd409749fa5625e4b632013b6ba81609b ("char/random: silence a lockdep splat with printk()"), and reverting it on top of v4.19.106 restored correct behaviour.

— Lech Perczak

Now that, sounds like the issue I'm having! The thread also discusses the bug is apparent on 4.19.106 (fixed in 4.19.107 — see this commit), and affects users of 5.5.9, 5.5.15, 5.6.3 (see the PATCHv2 thread).

Applying the patch

Next step is to apply the patch in order to test and verify this fixes the issue.

Since Fall last year, I have used sys-kernel/vanilla-kernel to compile, install, and create an initramfs for my two machines. This is a great ebuild because it uses a kernel .config based off of Archlinux's, so it is compatible with most machines. It is also streamlined in that it does all the work for you — no more manually configuring and remembering which make invocations are necessary to update the kernel. It's not hard to get right, but it's not particularly interesting in my use-case. Additionally, using sys-kernel/vanilla-kernel, the kernel & its modules are now packaged, and can be distributed to my other machine as a binpkg. This streamlines deployment significantly.

In order to add the patch to this ebuild, I simply have to drop the patch file into /etc/portage/patches/sys-kernel/vanilla-kernel. In my case I chose to drop it in /etc/portage/patches/sys-kernel/vanilla-kernel:5.6.4 because I rather the patch only be applied no the current kernel I have installed, than all versions of sys-kernel/vanilla-kernel. This ensures when I upgrade to to the upcoming 5.7 release (which has the fix included), the patch won't be applied and emerge won't fail due to the patch not being applied cleanly.

The commands (commit to my /etc/portage):

mkdir -p /etc/portage/patches/sys-kernel/vanilla-kernel:5.2.6
curl -o /etc/portage/patches/sys-kernel/vanilla-kernel:5.2.6/fix-dmesg--follow.patch \
        https://github.com/torvalds/linux/commit/ab6f762f0f53162d41497708b33c9a3236d3609e.patch
emerge -1av sys-kernel/vanilla-kernel:5.2.6

An hour later and the kernel is installed. After the reboot, indeed dmesg -w works once again! And the log messages in /var/log/kern.log have timestamps that correctly reflect the kernel time!

Conclusion

Even kernels have regressions. As discussed on IRC, I was reminded that the kernel project is not responsible for the userland, so it's possible such testcases might not be on the radar of most kernel developers. Perhaps it's the distros' responsibilities to execute integrated system testing to catch bugs like this. In any case it is still a surprise to see such a regression occur. We like to think of the kernel as this infallible magical machine that doesn't break except when you do something patently wrong, but this isn't really the case. We're all human.

I want to thank Tyil, Sergey (the patch author), Lech (the bug reporter) and some folks from the #linux IRC channel for helping me pinpoint this issue. The reader may think this is a lot of effort to go through to fix such a simple bug — but it's really important for the kernel to work — if the kernel misbehaves, anything is up for grab. It's not like your bug-laden browser product we accept will have crashing bugs in it — if the kernel crashes or misbehaves, the ramifications are almost as bad as if the hardware is failing — you'll lose your application data, productivity, and trust in the operating system itself.

It is important to mention LTS (Long term support) kernels exist. Given the amount of trouble I went to address this issue, and the fact I rather not have things breaking, I don't think I should be running a mainline kernel at the moment. Perhaps I can install both side by side. then pick 'n choose which kernel to use de jure.

I am very interested to hear of you, the reader's, suggestions for kernel maintenance and version selection strategies. You can find my contact details at https://winny.tech/ . Thank you for reading.

Tags: linux computing
25 Dec 2019

How to fix early framebuffer problems, or "Can I type my disk password yet??"

Most of my workstations & laptops require a passphrase typed in to open the encrypted root filesystem. So my steps to booting are as follows:

  1. Power on machine
  2. Wait for FDE passphrase prompt
  3. Type in FDE passphrase
  4. Wait for boot to complete and automatic XFCE session to start

Since I need to know when the computer is ready to accept the passphrase, it is important the framebuffer is usable during the early part of the boot. In the case of of HP Elitebook 820 G4, the EFI framebuffer does not appear to work, and I rather not boot in BIOS mode to get a functional VESA framebuffer. Making things more awkward, a firmware is needed when the i915 driver is loaded, or the framebuffer will not work either. (It’s not always clear if a firmware is needed, so one should run dmesg | grep -F firmware and check if firmware is being loaded.)

With this information, the problem is summarized to: “How do I ensure i915 is available at boot with the appropriate firmware?”. This question can be easily generalized to any framebuffer driver, as the steps are more-or-less the same.

Zeroth step: Do you need only a driver, or a driver with firmware?

IT is a good idea to verify if your kernel is missing a driver at boot, or is missing firmware or both. Boot up a Live USB with good hardware compatibility, such as GRML1 or Ubuntu’s, and let’s see what framebuffer driver our host is trying to use2:

$ dmesg | grep -i 'frame.*buffer'
[    4.790570] efifb: framebuffer at 0xe0000000, using 8128k, total 8128k
[    4.790611] fb0: EFI VGA frame buffer device
[    4.820637] Console: switching to colour frame buffer device 240x67
[    6.643895] i915 0000:00:02.0: fb1: i915drmfb frame buffer device

Se we can see the efifb is initially used for a couple seconds, then i915 is used for the rest of the computer’s uptime. Now let’s look at if firmware is necessary, first checking if modinfo(8) knows of any firmware:

$ modinfo i915 -F firmware
i915/bxt_dmc_ver1_07.bin
i915/skl_dmc_ver1_27.bin
i915/kbl_dmc_ver1_04.bin
... SNIP ...
i915/kbl_guc_33.0.0.bin
i915/icl_huc_ver8_4_3238.bin
i915/icl_guc_33.0.0.bin

This indicates this driver will load firmware when available, and if necessary for the particular mode of operation or hardware.

Now let’s look at dmesg to see if any firmware is loaded:

[    0.222906] Spectre V2 : Enabling Restricted Speculation for firmware calls
[    5.511731] [drm] Finished loading DMC firmware i915/kbl_dmc_ver1_04.bin (v1.4)
[   25.579703] iwlwifi 0000:02:00.0: loaded firmware version 36.77d01142.0 op_mode iwlmvm
[   25.612759] Bluetooth: hci0: Minimum firmware build 1 week 10 2014
[   25.620251] Bluetooth: hci0: Found device firmware: intel/ibt-12-16.sfi
[   25.712793] iwlwifi 0000:02:00.0: Allocated 0x00400000 bytes for firmware monitor.
[   27.042080] Bluetooth: hci0: Waiting for firmware download to complete

Aha! So it appears we need i915/kbl_dmc_ver1_04.bin for i915. In the case case one doesn’t need firmware, it won’t show anything related to drm or a line with your driver name in it.

By the way, it is a good idea to check dmesg for hints about missing firmware, or alternative drivers, for example my trackpad is supported by both i2c and synaptics based trackpad drivers, and the kernel was kind enough to tell me.

First step: Obtain the firmware

On Gentoo install sys-kernel/linux-firmware. You will have to agree to some non-free licenses; nothing too inane, but worth mentioning. Now just run emerge -av sys-kernel/linux-firmware. (On other distros it might be this easy, or more difficult; for example—in my experience Debian does not ship every single firmware like Gentoo does, so YMMV.)

Second step, Option A: Compile firmware into your kernel

Since most of my systems run Gentoo, it is business as usual to deploy a kernel with most excess drivers disabled except for common hot-swappable components such as USB network interfaces, audio devices, and so on. For example, this laptop’s config was originally derived from genkernel’ stock amd64 config with most extra drivers disabled, then augmented with support for an Acer ES1-111M-C7DE, and finally with support for this Elitebook.

I had compiled the kernel with i915 support built into the image, as opposed to an additional kernel module. Unfortunately this meant the kernel is unable to load firmware from filesystem, because it appears only kernel modules can load firmware from filesystem. To work around this without resorting to making i915 a kernel module, we can include the drivers within the kernel image (vmlinuz). Including firmware and drivers both in the vmlinuz has a couple benefits. First it will always be available. There is no need to figure out how to load the driver and firmware from initrd, let alone getting the initrd generator one is using, to cooperate. A downside is it makes the kernel very specific to the machine, because perhaps a different Intel machine needs a different firmware file compiled in.

To achieve including the firmware in kernel, I set the following values in my kernel config (.config in your kernel source tree).

CONFIG_EXTRA_FIRMWARE="i915/kbl_dmc_ver1_04.bin"
CONFIG_EXTRA_FIRMWARE_DIR="/lib/firmware"

Note, if you’re using menuconfig, you can type /EXTRA_FIRMWARE (slash for search, then the text) followed by keyboard return to find where these settings exist in the menu system.

Then I verified i915 is indeed not a kernel module, but built into the kernel image (it would be m if it’s a module):

CONFIG_DRM_I915=y

After compiling & installing the kernel (and generating a dracut initrd for cryptsetup/lvm), I was able to reboot and get an early pre-mounted-root framebuffer on this device.

Second step, Option B: A portable kernel approach (using sys-kernel/vanilla-kernel)

I discovered the Gentoo devs have begun shipping an ebuild that builds and installs a kernel with a portable, livecd friendly config. In addition this package will optionally generates an initrd with dracut as a pkgpostinst step, making it very suitable as a replacement for users who just want a working kernel, and don’t mind a excessive compatibility (at a cost to size and build time).

This presents a different challenge, because while this package does allow the user to drop in their own .config, it is not very multiple-machine-deployment friendly to hard-code each individual firmware into the kernel. Instead we tell dracut to include our framebuffer driver. As mentioned above I found this computer uses the i915 kernel driver for framebuffer. Let’s tell dracut to include the driver:

cat > /etc/dracut.conf.d/i915.conf <<EOF
add_drivers+=" i915 "
EOF

Dracut is smart enough to pick up the firmware the kernel module needs, provided they are installed. To get an idea what firmware dracut will include, run modinfo i915 -F firmware which will print out a bunch of firmware relative paths.

After applying this fix, just regenerate your initrd using dracut; in my case I let portage do the work: emerge -1av sys-kernel/vanilla-kernel. Finally reboot.

Conclusion

Check dmesg. Always check dmesg. We found two ways to deploy firmware, in-kernel and in-initrd. The in-kernel technique is best for a device-specific kernel, the in-initrd is best for a portable kernel. I am a big fan of the second technique because it scales well to many machines.

I did not touch on the political side of using binary blobs. It would be nice to not use any non-free software, but I rather have a working system with a couple small non-free components, than a non-working system. Which is more valuable, your freedom, or reduced capacity of your tools?

Footnotes:

1

GRML is my favorite live media. It is simple, to the point, has lots of little scripts to streamline tasks such as setting up a wireless AP, a iPXE netboot environment, a router, installing debian, and so on. And Remastering is relatively straight forward. It also has a sane gui sutable for any machine (fluxbox).

2

Thanks to this post on Ask Ubuntu

Tags: gentoo linux computing
Other posts

© Winston Weinert (winny) — CC-BY-SA-4.0