This gist documents my attempts to get to the bottom of spurious wakes after installing Ubuntu 20.04 LTS on my system.
Initially, I thought it might be another system on my network sending Wake-on-LAN (WoL) packets. Then I thought it might be a known XHCI spurious wake kernel issue. And lastly, I finally resolved things by actively disabling the ability of USB devices, e.g. the mouse, to wake the system.
Update: I later came up with a better way of disabling wake-on-mouse that's covered here.
Note: as one of these steps, I upgraded the system BIOS - while this didn't resolve this particular issue, it did resolve an annoying issue with the graphic state not being properly restored for certain applications after wake-up.
Look at logs:
$ journalctl | less
Go to end - G
- and search backwards with ?
for sleep
:
Dec 08 02:54:41 ghawkins-OptiPlex-3020 kernel: rfkill: input handler disabled
Dec 08 02:54:41 ghawkins-OptiPlex-3020 kernel: Generic FE-GE Realtek PHY r8169-300:00: attached PHY driver [Generic FE-GE Realtek PHY] (mii_bus:phy_addr=r8169-300:00, irq=IGNORE)
Dec 08 02:54:41 ghawkins-OptiPlex-3020 NetworkManager[637]: <info> [1607392481.5050] manager: sleep: wake requested (sleeping: yes enabled: yes)
Dec 08 02:54:41 ghawkins-OptiPlex-3020 NetworkManager[637]: <info> [1607392481.5051] device (enp3s0): state change: unmanaged -> unavailable (reason 'managed', sys-iface-state: 'managed')
I'm guessing that something on the network has requested a wake via enp3s0
(the primary network interface).
ifconfig
seems to out of favor and no longer installed so instead:
$ ip address
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
...
2: enp3s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
link/ether f8:bc:12:64:e6:2b brd ff:ff:ff:ff:ff:ff
inet 192.168.0.125/24 brd 192.168.0.255 scope global dynamic noprefixroute enp3s0
valid_lft 56382sec preferred_lft 56382sec
inet6 2a02:aa16:577d:df80::149/128 scope global dynamic noprefixroute
valid_lft 56385sec preferred_lft 56385sec
inet6 2a02:aa16:577d:df80:7c5d:23f7:19fd:c2e0/64 scope global temporary dynamic
valid_lft 574785sec preferred_lft 56229sec
inet6 2a02:aa16:577d:df80:1a3e:567a:d75a:57c5/64 scope global dynamic mngtmpaddr noprefixroute
valid_lft 1194771sec preferred_lft 589971sec
inet6 fe80::d716:5396:c3b8:fc9f/64 scope link noprefixroute
valid_lft forever preferred_lft forever
3: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default
link/ether 02:42:d7:2f:32:9b brd ff:ff:ff:ff:ff:ff
inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0
Then, from the Arch WoL wiki page:
$ sudo apt install ethtool
$ ethtool enp3s0
...
Cannot get wake-on-lan settings: Operation not permitted
...
Which might lead one to believe WoL isn't available, but...
$ sudo ethtool enp3s0
...
Supports Wake-on: pumbg
Wake-on: d
...
d
apparently means its disabled, so this idea looks like a dead end.
But let's keep trying...
Using Network Manager to find its name for the primary network interface:
$ nmcli con show
It turns out that it's Wired connection 1
, so let's look at its settings:
$ nmcli c show 'Wired connection 1'
It pipes its output through less
, seach for wake
and it shows:
802-3-ethernet.wake-on-lan: default
802-3-ethernet.wake-on-lan-password: --
Apparently, this would value would have to be magic
or some other reason for WoL. But it can also be set to ignore
to more definitely mean no WoL.
TODO: try the /etc/NetworkManager/conf.d/wake-on-lan.conf
described on the Arch page to set the 802-3-ethernet.wake-on-lan
to ignore
.
To trigger WoL from another machine, work out the broadcast address of the subnet that your machine is on:
$ ip address show enp3s0 | sed -n '/inet .* brd / s/.* brd \([0-9\.]*\) .*/\1/p'
192.168.0.255
And get the MAC address:
$ ip link
...
2: enp3s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP mode DEFAULT group default qlen 1000
link/ether f8:bc:12:64:e6:2b brd ff:ff:ff:ff:ff:ff
^^^^^^^^^^^^^^^^^
Then, in my case, using a Mac on the same subnet:
$ brew install wakeonlan
$ wakeonlan -i 192.168.0.255 f8:bc:12:64:e6:2b
This did not trigger my machine to wake. But was my machine receiving the packets? To check...
On the receiving machine (while woken):
$ sudo apt install ngrep
$ sudo ngrep '\xff{6}(.{6})\1{15}' -x port 9
Then back on the Mac, redo the above:
$ wakeonlan -i 192.168.0.255 f8:bc:12:64:e6:2b
And I can see that the packet is seen. In fact I can also see that the default broadcast address - 255.255.255.255
- works fine:
$ wakeonlan f8:bc:12:64:e6:2b
Check WoL in BIOS:
- Press F2 repeatedly (do not hold down) during restart
- Go to Power Management / Wake on LAN
This showed WoL is disabled at the BIOS level.
Is there an issue with the BIOS? Let's update it...
First determine the serial number and BIOS version:
$ sudo inxi -F | fgrep -A1 Machine
Machine: Type: Desktop System: Dell product: OptiPlex 3020 v: 01 serial: 7abcdef2
Mobo: Dell model: 0VHWTR v: A02 serial: /75NTP02/CN7016343407OO/ BIOS: Dell v: A03 date: 04/14/2014
So the serial number is 7abcdef2
and the BIOS version is A03
. Using the serial number find the latest BIOS at DELL support.
It turns out that it's A20
- so there have been quite a lot of updates since A03
.
The update is a .exe
so this necessitates booting to Windows or DOS.
Note: DELL do support updating the BIOS etc. in Linux but only for RHEL - see here.
After various dead ends, this didn't prove too difficult - the main dead end was this AskUbuntu answer. If SystemRescueCD ever supported FreeDOS, it does not anymore.
So the obvious answer would seem to be to use FreeDOS directly. Unfortunately, many web pages and, amazingly, their own wiki have out of date instructions on how to create a bootable FreeDOS USB stick.
In the end it's easy:
- Download their Lite USB zip.
- Unpack it and use Etcher to create a bootable USB key from the
.img
file that was in the zip.
Oddly, this process creates a disk that's partitioned to be only exactly big enough for its contents and no more so I couldn't copy the BIOS update .exe
to the drive and I had no luck booting to FreeDOS and getting it to see another USB stick with the .exe
on it.
After a number of dangerous experiments where I almost repartioned my primary disk, I came to the conclusion that it is not currently possible to resize the main partition on the USB stick (parted
says they're working on support for DOS disks but currently only support EXT etc.).
It turns out that the resulting disk isn't really a live CD, it's primarily an installation disk for FreeDOS. So in the end I just freed up space by deleting some of the packages that it would install but doesn't need immediately:
$ cd /media/$USER/FD-SETUP/FDSETUP/PACKAGES/
$ rm -rf UTIL
$ cd ../..
$ cp ~/Downloads/O3020A20.exe .
O3020A20.exe
is the BIOS update downloaded from DELL.
Then I booted the system from the FreeDOS USB stick - for whatever reason, I had to shutdown and start the system, simply rebooting just rebooted to Linux.
Rather frighteningly, FreeDOS defaults to suggesting it install FreeDOS on your primarily drive - instead, just exit to DOS.
Now you can run O3020A20.exe
- the prompt supports tab completion:
C:\> O3020A20.exe
The process went without a hitch and after removing the USB stick and rebooting I could confirm that the BIOS had been updated to A20:
$ sudo inxi -F | fgrep -A1 Machine
Machine: Type: Desktop System: Dell product: OptiPlex 3020 v: 00 serial: 75NTP02
Mobo: Dell model: 0VHWTR v: A02 serial: /75NTP02/CN7016343407OO/ BIOS: Dell v: A20 date: 05/27/2019
The BIOS update fixed something unrelated - an annoying issue where various panels in UI applications showed up filled with noise (rather than whatever flat color or image should have been there) after wake-up. But it didn't fix the spurious wake-up issue itself.
The Kernel quirks section of the Arch page describes enabling the XHCI_SPURIOUS_REBOOT
and XHCI_SPURIOUS_WAKEUP
quirks to potentially solve this problem:
$ sudo vim /etc/default/grub
Then I changed the line:
GRUB_CMDLINE_LINUX_DEFAULT="quiet splash"
To:
GRUB_CMDLINE_LINUX_DEFAULT="quiet splash xhci_hcd.quirks=270336"
I.e. I added xhci_hcd.quirks=270336
as covered in the Arch page. Then:
$ sudo update-grub
And reboot.
Before making the above change, you can check the existing XHCI module parameters like this:
$ systool -v -m xhci_hcd
Module = "xhci_hcd"
Attributes:
uevent = <store method only>
Parameters:
link_quirk = "0"
quirks = "0"
Or simply like this:
$ cd /sys/module/xhci_hcd/parameters
$ cat quirks
0
$ cat link_quirk
0
After setting the xhci_hcd.quirks
parameter, as shown above and then rebooting, you can see that the value has been set:
$ cat /sys/module/xhci_hcd/parameters/quirks
270336
There's no generic way for decoding such values, you have to look at the module code itself - in this case the defines for XHCI_SPURIOUS_REBOOT
and XHCI_SPURIOUS_WAKEUP
are in drivers/usb/host/xhci.h
. There, you see:
#define XHCI_SPURIOUS_REBOOT BIT_ULL(13)
...
#define XHCI_SPURIOUS_WAKEUP BIT_ULL(18)
And if you convert 270336 to a binary number:
$ bc
obase=2
270336
1000010000000000000
Then you see that the 18th and 13th bit are set (if you consider the right-most bit as the 0th bit rather than the 1st bit).
While looking for how to query the xhci_hcd
module parameters, I found lots of ways of querying related information. Note that all of the following information is unaffected by the xhci_hcd.quirks
change made above. I.e. it was the same before and after this change.
1. Using lspci
:
$ sudo lspci -v | fgrep -i -A4 xhci
00:14.0 USB controller: Intel Corporation 8 Series/C220 Series Chipset Family USB xHCI (rev 04) (prog-if 30 [XHCI])
Subsystem: Dell 8 Series/C220 Series Chipset Family USB xHCI
Flags: bus master, medium devsel, latency 0, IRQ 27
Memory at f7200000 (64-bit, non-prefetchable) [size=64K]
Capabilities: [70] Power Management version 2
Capabilities: [80] MSI: Enable+ Count=1/8 Maskable- 64bit+
Kernel driver in use: xhci_hcd
Note: sudo
is needed here to retrieve the information for the Capabilities
lines.
2. Using the /boot/config-*
files:
$ grep -i xhci /boot/config-$(uname -r)
CONFIG_USB_XHCI_HCD=y
CONFIG_USB_XHCI_DBGCAP=y
CONFIG_USB_XHCI_PCI=y
CONFIG_USB_XHCI_PLATFORM=m
CONFIG_USB_ROLES_INTEL_XHCI=m
3. Using lshw
:
$ sudo lshw
...
*-usb:0
description: USB controller
product: 8 Series/C220 Series Chipset Family USB xHCI
vendor: Intel Corporation
physical id: 14
bus info: pci@0000:00:14.0
version: 04
width: 64 bits
clock: 33MHz
capabilities: pm msi xhci bus_master cap_list
configuration: driver=xhci_hcd latency=0
resources: irq:27 memory:f7200000-f720ffff
*-usbhost:0
product: xHCI Host Controller
vendor: Linux 5.4.0-56-generic xhci-hcd
physical id: 0
bus info: usb@3
logical name: usb3
version: 5.04
capabilities: usb-2.00
configuration: driver=hub slots=10 speed=480Mbit/s
...
*-usbhost:1
product: xHCI Host Controller
vendor: Linux 5.4.0-56-generic xhci-hcd
physical id: 1
bus info: usb@4
logical name: usb4
version: 5.04
capabilities: usb-3.00
configuration: driver=hub slots=2 speed=5000Mbit/s
...
4. Using modinfo
:
$ modinfo xhci_hcd
name: xhci_hcd
filename: (builtin)
license: GPL
author: Sarah Sharp
description: 'eXtensible' Host Controller (xHC) Driver
parm: link_quirk:Don't clear the chain bit on a link TRB (int)
parm: quirks:Bit flags for quirks to be enabled as default (ullong)
If this still doesn't fix things, there's a section on the Arch WoL page describing issues with the Realtek 8168 NIC. And that's exactly the NIC I have:
$ sudo inxi -F | fgrep Network
Network: Device-1: Realtek RTL8111/8168/8411 PCI Express Gigabit Ethernet driver: r8169
Note that my driver, though, is r8169
(and the Arch page refers to driver r8168
).
It doesn't seem to be possible to query this module for its parameters (I tried the first two answers here). And if I go into /sys/module/r8169
there are no obvious parameters.
It may be possible to disable WoL with s5wol
(they describe using it to enable WoL).
But given that WoL is disabled in BIOS, I think the spurious wakeups quirks settings are a more likely fix.
A further step might be getting NetworkManager to log at debug level as described here.
The WoL steps documented above and the quirks changes didn't resolve the issue. So next I tried disabling the ability of various devices to wake up the system (as described here on Hacker News).
You can list all the devices that can wake up the computer:
$ fgrep -w -e enabled -e Device /proc/acpi/wakeup
Device S-state Status Sysfs node
RP04 S4 *enabled pci:0000:00:1c.3
EHC1 S3 *enabled pci:0000:00:1d.0
EHC2 S3 *enabled pci:0000:00:1a.0
XHC S4 *enabled pci:0000:00:14.0
PEG0 S4 *enabled pci:0000:00:01.0
PWRB S3 *enabled platform:PNP0C0C:00
Oddly, there's no way to look up what these device names mean, they're determined by individual vendors. However, some names are used fairly consistently across vendors and luckily all of the above are well-known names. For a list of these well-known names see this SO answer.
In the above case, all the devices, except PWRB
, are PCI devices - so we can work out what they are:
$ lspci -tv
-[0000:00]-+-00.0 Intel Corporation 4th Gen Core Processor DRAM Controller
+-01.0-[01]--+-00.0 NVIDIA Corporation GK208B [GeForce GT 730]
| \-00.1 NVIDIA Corporation GK208 HDMI/DP Audio Controller
+-14.0 Intel Corporation 8 Series/C220 Series Chipset Family USB xHCI
+-16.0 Intel Corporation 8 Series/C220 Series Chipset Family MEI Controller #1
+-1a.0 Intel Corporation 8 Series/C220 Series Chipset Family USB EHCI #2
+-1b.0 Intel Corporation 8 Series/C220 Series Chipset High Definition Audio Controller
+-1c.0-[02]--
+-1c.3-[03]----00.0 Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
+-1d.0 Intel Corporation 8 Series/C220 Series Chipset Family USB EHCI #1
+-1f.0 Intel Corporation H81 Express LPC Controller
+-1f.2 Intel Corporation 8 Series/C220 Series Chipset Family 6-port SATA Controller 1 [AHCI mode]
\-1f.3 Intel Corporation 8 Series/C220 Series Chipset Family SMBus Controller
So from the SO answer and the data above we can determine:
- RP0x means a PCI slot - in this case RP04 is the ethernet controller.
- EHCx means USB 2.0 - here we can't determine anything more than that EHC1 and EHC2 are USB related.
- XHC means USB 3.0 - similarly we can't work out much more.
- PEGx means a PCIe for Graphics slot - in this case PEG0 is a Nvidia graphics card.
PWRB
isn't a PCI device, but from the SO answer we can see that it's the power button.
You can find out some more information about USB devices with lsusb
, e.g. above we can see that the the node for EHC2
is 0000:00:1a.0
. We can grep
for that like so:
$ sudo lsusb -v 2> /dev/null | fgrep --before-context=9 0000:00:1a.0
bDeviceClass 9 Hub
bDeviceSubClass 0
bDeviceProtocol 0 Full speed (or root) hub
bMaxPacketSize0 64
idVendor 0x1d6b Linux Foundation
idProduct 0x0002 2.0 root hub
bcdDevice 5.08
iManufacturer 3 Linux 5.8.0-43-generic ehci_hcd
iProduct 2 EHCI Host Controller
iSerial 1 0000:00:1a.0
We can see that EHC2
is a USB hub.
Note: you don't really need to use sudo
but in some situations it gives you slightly more information.
So the devices listed above for /proc/acpi/wakeup
aren't individual devices like mice or keyboards. We can only disable e.g. USB devices at the hub level.
So let's try just disabling the USB devices, leaving the ethernet and graphics card alone - we can then just use the power button to wake things up.
$ sudo bash
# cd /usr/lib/systemd/system-sleep
# cat > wakefix << "EOF"
#!/bin/bash -e
for device in EHC1 EHC2 XHC
do
echo $device > /proc/acpi/wakeup
done
EOF
# chmod a+x wakefix
# ./wakefix
# fgrep -w -e enabled -e Device /proc/acpi/wakeup
Device S-state Status Sysfs node
RP04 S4 *enabled pci:0000:00:1c.3
PEG0 S4 *enabled pci:0000:00:01.0
PWRB S3 *enabled platform:PNP0C0C:00
So we sudo
to root, go to /usr/lib/systemd/system-sleep
, create a simple bash script called wakefix
, run it and finally check that EHC1
, EHC2
and XHC
are no longer enabled.
Important: the double-quotes around "EOF"
are actually imporant - they stop variable substitution, e.g. of $device
, occurring when we create the script. See this SO answer.
This script is run by the systemd suspend
service just before entering suspend state. I had to actually reboot my system for this to start happening - perhaps it would have been enough to just restart the suspend
service. It wasn't enough to have simply created the script or to have run it manually. When the system wakes from sleep the relevant devices are reenabled, i.e. this script only disables them for the period that the system is suspended. See the suspend
service man page for more information.
Now the system can only be woken by a short press to the power button.