A HP Z4 G4 using a Xeon(R) W-2123 CPU was updated from AlmaLinux 8.9 to AlmaLinux 8.10, which updated the Kernel version. Following the update the boot hung. After the GRUB menu timeout when when to boot the default Kernel just got a black screen with a non-flashing text cursor at the top left of the screen. Didn't respond to the keyboard, so had to hold the the power button to force a power off. Hung in the same way in multiple boot attempts.
Secure boot is enabled.
As an initial work-around in the GRUB menu selected the previous Kernel version and when had booted used grubby
to change the default Kernel to the previous version.
This is the previous version which still boots when changed back to the default:
$ sudo grubby --info=DEFAULT
index=1
kernel="/boot/vmlinuz-4.18.0-513.24.1.el8_9.x86_64"
args="ro resume=/dev/mapper/almalinux_skylake--alma-swap rd.lvm.lv=almalinux_skylake-alma/root rd.lvm.lv=almalinux_skylake-alma/swap rhgb quiet $tuned_params intel_iommu=on"
root="/dev/mapper/almalinux_skylake--alma-root"
initrd="/boot/initramfs-4.18.0-513.24.1.el8_9.x86_64.img $tuned_initrd"
title="AlmaLinux (4.18.0-513.24.1.el8_9.x86_64) 8.9 (Midnight Oncilla)"
id="5de40fe84eb94d6da4f1b208114750bb-4.18.0-513.24.1.el8_9.x86_64"
And this is the updated version which hangs during boot on the HP Z4 G4:
$ sudo grubby --info=0
index=0
kernel="/boot/vmlinuz-4.18.0-553.el8_10.x86_64"
args="ro resume=/dev/mapper/almalinux_skylake--alma-swap rd.lvm.lv=almalinux_skylake-alma/root rd.lvm.lv=almalinux_skylake-alma/swap rhgb quiet $tuned_params"
root="/dev/mapper/almalinux_skylake--alma-root"
initrd="/boot/initramfs-4.18.0-553.el8_10.x86_64.img $tuned_initrd"
title="AlmaLinux (4.18.0-553.el8_10.x86_64) 8.10 (Cerulean Leopard)"
id="5de40fe84eb94d6da4f1b208114750bb-4.18.0-553.el8_10.x86_64"
How to completely remove a recent kernel installaton? on the Rocky Linux forum reports the same issue in terms of the Kernel version which hangs during boot, and the previous version which boots. No solution offered in that forum thread.
On a different PC, with a i5-2310
CPU also updated from AlmaLinux 8.9 to 8.10. On this other PC the same Kernel boots OK:
$ sudo grubby --info=DEFAULT
[sudo] password for mr_halfword:
index=3
kernel="/boot/vmlinuz-4.18.0-553.el8_10.x86_64"
args="ro crashkernel=auto resume=/dev/mapper/almalinux-swap rd.lvm.lv=almalinux/root rd.lvm.lv=almalinux/swap rhgb quiet $tuned_params"
root="/dev/mapper/almalinux-root"
initrd="/boot/initramfs-4.18.0-553.el8_10.x86_64.img $tuned_initrd"
title="AlmaLinux (4.18.0-553.el8_10.x86_64) 8.10 (Cerulean Leopard)"
id="9d2f49d60f7e49e0b06aaed533ddfcd8-4.18.0-553.el8_10.x86_64"
This PC doesn't have secure boot enabled, as not supported in the BIOS.
With the problematic Kernel version used GRUB to select it and try and edited the Kernel command line to get more debug.
Removed the rhgb
and quiet
options. Outputs the following before the boot hangs:
EFI stub: UEFI Secure Boot is enabled.
Adding the following didn't produce more output:
debug
ignore_loglevel
initcall_debug
The current BIOS version is 02.91
. A new version 02.92
is available. Updated the BIOS using the BIOS option which downloads and flashes the update BIOS from the HP website.
The BIOS update didn't change the failure mode.
Disabled secure boot, but the problematic still hung.
With secure boot disabled, the different in behaviour was that with rhgb
and quiet
options was back to just a blank screen when the boot hung. Therefore, re-enabled secure boot.
The UEFI Secure Boot is enabled.
text appears in the xen_efi_get_secureboot
function in the arch/x86/xen/efi.c
.
Tried changing the System Security options in the BIOS:
- Disabled Intel Trusted Execution Technology (TXT). This didn't change the behaviour.
- Also disabled Intel virtualization Technology (VTx). This didn't change the behaviour.
- Also disabled Intel virtualization Technology for Directed I/O (VTd). For the Kernel which boots, this disabled use of the IOMMU. This didn't change the behaviour of the problematic Kernel.
As a result, restored the original System Security options.
Following an update to Kernel 4.18.0-553.5.1.el8_10.x86_64
that later Kernel version also:
- Causes the HP Z4 G4 to hang during boot in the same way.
- Boots successfully on the different PC with a
i5-2310
CPU.
Set the default Kernel back to the working version:
[mr_halfword@skylake-alma ~]$ sudo grubby --set-default-index=2
The default is /boot/loader/entries/5de40fe84eb94d6da4f1b208114750bb-4.18.0-513.24.1.el8_9.x86_64.conf with index 2 and kernel /boot/vmlinuz-4.18.0-513.24.1.el8_9.x86_64
[mr_halfword@skylake-alma ~]$ sudo grubby --info=DEFAULT
index=2
kernel="/boot/vmlinuz-4.18.0-513.24.1.el8_9.x86_64"
args="ro resume=/dev/mapper/almalinux_skylake--alma-swap rd.lvm.lv=almalinux_skylake-alma/root rd.lvm.lv=almalinux_skylake-alma/swap rhgb quiet $tuned_params intel_iommu=on"
root="/dev/mapper/almalinux_skylake--alma-root"
initrd="/boot/initramfs-4.18.0-513.24.1.el8_9.x86_64.img $tuned_initrd"
title="AlmaLinux (4.18.0-513.24.1.el8_9.x86_64) 8.9 (Midnight Oncilla)"
id="5de40fe84eb94d6da4f1b208114750bb-4.18.0-513.24.1.el8_9.x86_64"
Entered the BIOS and disabled all the PCIe and M.2 slots. The theory was that if the newer Kernel then booted, could enable the slots one a time to try and identify a PCIe device causing the hang during boot.
However, the PC no longer boots. The power led flashwes 3 long blinks in Red and 3 short blinks in White. According to Table 7-3 Interpreting POST diagnostic front panel lights and audible codes in the Maintenance and Service Guide that means:
Category | Major/minor code | Description |
---|---|---|
Hardware | 3.3 | The embedded controller has timed out waiting for BIOS to return from graphics initialization |
Pressed the CMOS button on the motherboard to reset the CMOS settings. To get access to the buttom had to remove the Intel X722 network card in PCIe slot 5. Took a few attempts for the CMOS settings to be cleared. Not sure due to not due to presssing the button long enough, or not long enough with power removed.
Got a physical presense prompt on the monitor to confirm secure boot has been disabled.
The PC power cycled itself several time before reporting AMT Global Status. 974 AMT global status does not match Bios-setup. This is due to the issue in On a HP Z4 G4 where bricked the Intel AMT on the PC. In the BIOS under Advanced v -> Remote Management Options disabled Intel Management Engine to avoid the AMT error at boot.
The reset of the CMOS settings cause the M.2 SSD with Windows to be the boot, and didn't have the GRUB menu.
Selected the boot menu in the BIOS and selected the HDD with AlmaLinux installed. That repaired the boot menu, and the GRUB menu re-appeared at boot.
After this CMOS settings reset, and minimum further changes to allow to boot, Kernel 4.18.0-553.5.1.el8_10.x86_64
still hung at boot.
Used the BIOS to disable all PCIe and M.2 slots, except the PCIe slot 2 with the NVIDIA Quadro P2000. Kernel 4.18.0-553.5.1.el8_10.x86_64
still hung during boot.
The HP BIOS BCU Windows BiosConfigUtility64.exe
when gets all the BIOS options reports the parameter:
Headless Boot
*Disable
Enable
Tried enabling it, but no other options were exposed about where to redirect the BIOS output to.
If try and boot without a graphics card fitted then:
- Need some way of re-directing GRUB to a serial console, to be able to change the selected Kernel.
- Need some way pf re-directing the Kernel output during boot to a serial console, to view any diagnostics.
- If disable the PCIe slot containing the graphics card, rather than removing the graphics card, need the HP BIOS BCU for Linux installed to be able to change the BIOS settings to re-enabled the PCIe slot for the graphics card at the end of the test.
Therefore, changed Headless Boot
back to Disable
and didn't try and boot with it enabled.
The Kernel 4.18.0-553.5.1.el8_10.x86_64
which boots on the PC with a i5-2310
was installed on an external USB HDD.
Connected the external USB to the HP Z4 G4 and used the BIOS boot menu to boot from the external HDD:
- Attempting to boot the Kernel
4.18.0-553.5.1.el8_10.x86_64
hung. - Booting the
4.18.0-513.24.1.el8_9.x86_64
Kernel worked.
This means the issue on the HP Z4 G4 isn't due to the installation of the later Kernel versions being corrupt, but rather an interaction with the later Kernel versions and some component of the HP Z4 G4.
Following having to reset the CMOS settings to default, need to restore the BIOS options.
Re-enabled SDCARD boot option in the BIOS.
For other options used the HP BCU under Windows.
Enable secure boot:
C:\SWSetup\SP143621>BiosConfigUtility64.exe /setvalue:"Configure Legacy Support and Secure Boot","Legacy Support Disable and Secure Boot Enable"
<BIOSCONFIG Version="" Computername="COMPUTERNAME" Date="2024/06/09" Time="20:33:07" UTC="1">
<SETTING changeStatus="pass" name="Configure Legacy Support and Secure Boot" returnCode="0">
<OLDVALUE><![CDATA[Legacy Support Disable and Secure Boot Disable]]></OLDVALUE>
<VALUE><![CDATA[Legacy Support Disable and Secure Boot Enable]]></VALUE>
</SETTING>
<SUCCESS msg="No errors occurred" />
<Information msg="BCU return value" real="0" translated="0" />
</BIOSCONFIG>
The readback of the "Configure Legacy Support and Secure Boot" didn't update until after had rebooted, part of which the BIOS power cycles the PC which think is required for a change to secure boot to take effect.
Disable the internal speakers:
C:\SWSetup\SP143621>BiosConfigUtility64.exe /setvalue:"Internal Speakers","Disable"
<BIOSCONFIG Version="" Computername="COMPUTERNAME" Date="2024/06/09" Time="20:34:32" UTC="1">
<SETTING changeStatus="pass" name="Internal Speakers" returnCode="0">
<OLDVALUE><![CDATA[Enable]]></OLDVALUE>
<VALUE><![CDATA[Disable]]></VALUE>
</SETTING>
<SUCCESS msg="No errors occurred" />
<Information msg="BCU return value" real="0" translated="0" />
</BIOSCONFIG>
Enable Advanced Error Control:
C:\SWSetup\SP143621>BiosConfigUtility64.exe /setvalue:"Advanced Error Control","Enable"
<BIOSCONFIG Version="" Computername="COMPUTERNAME" Date="2024/06/09" Time="20:35:42" UTC="1">
<SETTING changeStatus="pass" name="Advanced Error Control" returnCode="0">
<OLDVALUE><![CDATA[Disable]]></OLDVALUE>
<VALUE><![CDATA[Enable]]></VALUE>
</SETTING>
<SUCCESS msg="No errors occurred" />
<Information msg="BCU return value" real="0" translated="0" />
</BIOSCONFIG>
And disable error handling on the slots with FPGAs fitted, to stop the BIOS reporting a fatal error when the FPGA is re-loaded and therefore the PCIe link dropped:
C:\SWSetup\SP143621>BiosConfigUtility64.exe /setvalue:"Slot 3 Error Handling","Disable"
<BIOSCONFIG Version="" Computername="COMPUTERNAME" Date="2024/06/09" Time="20:37:34" UTC="1">
<SETTING changeStatus="pass" name="Slot 3 Error Handling" returnCode="0">
<OLDVALUE><![CDATA[Enable]]></OLDVALUE>
<VALUE><![CDATA[Disable]]></VALUE>
</SETTING>
<SUCCESS msg="No errors occurred" />
<Information msg="BCU return value" real="0" translated="0" />
</BIOSCONFIG>
C:\SWSetup\SP143621>BiosConfigUtility64.exe /setvalue:"M.2 SSD0 Error Handling","Disable"
<BIOSCONFIG Version="" Computername="COMPUTERNAME" Date="2024/06/09" Time="20:38:20" UTC="1">
<SETTING changeStatus="pass" name="M.2 SSD0 Error Handling" returnCode="0">
<OLDVALUE><![CDATA[Enable]]></OLDVALUE>
<VALUE><![CDATA[Disable]]></VALUE>
</SETTING>
<SUCCESS msg="No errors occurred" />
<Information msg="BCU return value" real="0" translated="0" />
</BIOSCONFIG>
Enable hot-plug on the slot with the TEF1001 FPGA board:
C:\SWSetup\SP143621>BiosConfigUtility64.exe /setvalue:"Slot 3 Hot Plug","Enable"
<BIOSCONFIG Version="" Computername="COMPUTERNAME" Date="2024/06/09" Time="20:41:11" UTC="1">
<SETTING changeStatus="pass" name="Slot 3 Hot Plug" returnCode="0">
<OLDVALUE><![CDATA[Disable]]></OLDVALUE>
<VALUE><![CDATA[Enable]]></VALUE>
</SETTING>
<SUCCESS msg="No errors occurred" />
<Information msg="BCU return value" real="0" translated="0" />
</BIOSCONFIG>
C:\SWSetup\SP143621>BiosConfigUtility64.exe /setvalue:"Slot 3 Hot Plug Buses","8"
<BIOSCONFIG Version="" Computername="COMPUTERNAME" Date="2024/06/09" Time="20:41:30" UTC="1">
<SETTING changeStatus="pass" name="Slot 3 Hot Plug Buses" returnCode="0">
<OLDVALUE><![CDATA[0]]></OLDVALUE>
<VALUE><![CDATA[8]]></VALUE>
</SETTING>
<SUCCESS msg="No errors occurred" />
<Information msg="BCU return value" real="0" translated="0" />
</BIOSCONFIG>
Until the root cause is fixed, want to avoid any automatic Kernel updates from removing the old Kernel which still boots.
Therefore, remove 4.18.0-553.el8_10.x86_64 which is the 1st Kernel version on which the problem was found.
The following identifies the kernel-modules-4.18.0-553.el8_10.x86_64
and kernel-core-4.18.0-553.el8_10.x86_64
packages as the candidates to be removed:
$ yum provides /lib/modules/4.18.0-553.el8_10.x86_64/kernel/drivers/infiniband/hw/mlx5/mlx5_ib.ko.xz
Last metadata expiration check: 1 day, 2:07:46 ago on Sat 08 Jun 2024 21:25:59 BST.
kernel-modules-4.18.0-553.el8_10.x86_64 : kernel modules to match the core
: kernel
Repo : @System
Matched from:
Filename : /lib/modules/4.18.0-553.el8_10.x86_64/kernel/drivers/infiniband/hw/mlx5/mlx5_ib.ko.xz
kernel-modules-4.18.0-553.el8_10.x86_64 : kernel modules to match the core
: kernel
Repo : baseos
Matched from:
Filename : /lib/modules/4.18.0-553.el8_10.x86_64/kernel/drivers/infiniband/hw/mlx5/mlx5_ib.ko.xz
$ yum provides "/boot/*4.18.0-553.el8_10.x86_64*"
Last metadata expiration check: 1 day, 2:08:41 ago on Sat 08 Jun 2024 21:25:59 BST.
kernel-core-4.18.0-553.el8_10.x86_64 : The Linux kernel
Repo : @System
Matched from:
Filename : /boot/.vmlinuz-4.18.0-553.el8_10.x86_64.hmac
Filename : /boot/System.map-4.18.0-553.el8_10.x86_64
Filename : /boot/config-4.18.0-553.el8_10.x86_64
Filename : /boot/initramfs-4.18.0-553.el8_10.x86_64.img
Filename : /boot/symvers-4.18.0-553.el8_10.x86_64.gz
Filename : /boot/vmlinuz-4.18.0-553.el8_10.x86_64
kernel-core-4.18.0-553.el8_10.x86_64 : The Linux kernel
Repo : baseos
Matched from:
Filename : /boot/.vmlinuz-4.18.0-553.el8_10.x86_64.hmac
Filename : /boot/System.map-4.18.0-553.el8_10.x86_64
Filename : /boot/config-4.18.0-553.el8_10.x86_64
Filename : /boot/initramfs-4.18.0-553.el8_10.x86_64.img
Filename : /boot/symvers-4.18.0-553.el8_10.x86_64.gz
Filename : /boot/vmlinuz-4.18.0-553.el8_10.x86_64
kernel-debug-core-4.18.0-553.el8_10.x86_64 : The Linux kernel compiled with
: extra debugging enabled
Repo : baseos
Matched from:
Filename : /boot/.vmlinuz-4.18.0-553.el8_10.x86_64+debug.hmac
Filename : /boot/System.map-4.18.0-553.el8_10.x86_64+debug
Filename : /boot/config-4.18.0-553.el8_10.x86_64+debug
Filename : /boot/initramfs-4.18.0-553.el8_10.x86_64+debug.img
Filename : /boot/symvers-4.18.0-553.el8_10.x86_64+debug.gz
Filename : /boot/vmlinuz-4.18.0-553.el8_10.x86_64+debug
Removed the packages:
$ sudo yum remove kernel-core-4.18.0-553.el8_10.x86_64 kernel-modules-4.18.0-553.el8_10.x86_64
Dependencies resolved.
================================================================================
Package Arch Version Repository Size
================================================================================
Removing:
kernel-core x86_64 4.18.0-553.el8_10 @baseos 71 M
kernel-modules x86_64 4.18.0-553.el8_10 @baseos 25 M
Removing dependent packages:
kernel x86_64 4.18.0-553.el8_10 @baseos 0
kernel-modules-extra x86_64 4.18.0-553.el8_10 @baseos 686 k
Transaction Summary
================================================================================
Remove 4 Packages
Freed space: 97 M
Is this ok [y/N]: y
Running transaction check
Transaction check succeeded.
Running transaction test
Transaction test succeeded.
Running transaction
Preparing : 1/1
Erasing : kernel-4.18.0-553.el8_10.x86_64 1/4
Running scriptlet: kernel-4.18.0-553.el8_10.x86_64 1/4
Erasing : kernel-modules-extra-4.18.0-553.el8_10.x86_64 2/4
Running scriptlet: kernel-modules-extra-4.18.0-553.el8_10.x86_64 2/4
Erasing : kernel-modules-4.18.0-553.el8_10.x86_64 3/4
Running scriptlet: kernel-modules-4.18.0-553.el8_10.x86_64 3/4
Running scriptlet: kernel-core-4.18.0-553.el8_10.x86_64 4/4
Erasing : kernel-core-4.18.0-553.el8_10.x86_64 4/4
Running scriptlet: kernel-core-4.18.0-553.el8_10.x86_64 4/4
Verifying : kernel-4.18.0-553.el8_10.x86_64 1/4
Verifying : kernel-core-4.18.0-553.el8_10.x86_64 2/4
Verifying : kernel-modules-4.18.0-553.el8_10.x86_64 3/4
Verifying : kernel-modules-extra-4.18.0-553.el8_10.x86_64 4/4
Removed:
kernel-4.18.0-553.el8_10.x86_64
kernel-core-4.18.0-553.el8_10.x86_64
kernel-modules-4.18.0-553.el8_10.x86_64
kernel-modules-extra-4.18.0-553.el8_10.x86_64
Complete!
The files for that Kernel version have been removed from /boot
:
$ ls /boot/*4.18.0-553.el8_10.x86_64*
ls: cannot access '/boot/*4.18.0-553.el8_10.x86_64*': No such file or directory
And removed from the list of Kernels reported by grubby
:
$ sudo grubby --info=ALL
index=0
kernel="/boot/vmlinuz-4.18.0-553.5.1.el8_10.x86_64"
args="ro resume=/dev/mapper/almalinux_skylake--alma-swap rd.lvm.lv=almalinux_skylake-alma/root rd.lvm.lv=almalinux_skylake-alma/swap rhgb quiet $tuned_params"
root="/dev/mapper/almalinux_skylake--alma-root"
initrd="/boot/initramfs-4.18.0-553.5.1.el8_10.x86_64.img $tuned_initrd"
title="AlmaLinux (4.18.0-553.5.1.el8_10.x86_64) 8.10 (Cerulean Leopard)"
id="5de40fe84eb94d6da4f1b208114750bb-4.18.0-553.5.1.el8_10.x86_64"
index=1
kernel="/boot/vmlinuz-4.18.0-513.24.1.el8_9.x86_64"
args="ro resume=/dev/mapper/almalinux_skylake--alma-swap rd.lvm.lv=almalinux_skylake-alma/root rd.lvm.lv=almalinux_skylake-alma/swap rhgb quiet $tuned_params intel_iommu=on"
root="/dev/mapper/almalinux_skylake--alma-root"
initrd="/boot/initramfs-4.18.0-513.24.1.el8_9.x86_64.img $tuned_initrd"
title="AlmaLinux (4.18.0-513.24.1.el8_9.x86_64) 8.9 (Midnight Oncilla)"
id="5de40fe84eb94d6da4f1b208114750bb-4.18.0-513.24.1.el8_9.x86_64"
index=2
kernel="/boot/vmlinuz-0-rescue-5de40fe84eb94d6da4f1b208114750bb"
args="ro resume=/dev/mapper/almalinux_skylake--alma-swap rd.lvm.lv=almalinux_skylake-alma/root rd.lvm.lv=almalinux_skylake-alma/swap rhgb quiet"
root="/dev/mapper/almalinux_skylake--alma-root"
initrd="/boot/initramfs-0-rescue-5de40fe84eb94d6da4f1b208114750bb.img"
title="AlmaLinux (0-rescue-5de40fe84eb94d6da4f1b208114750bb) 8.6 (Sky Tiger)"
id="5de40fe84eb94d6da4f1b208114750bb-0-rescue"
Confirmed that the default Kernel is still 4.18.0-513.24.1.el8_9.x86_64
which boots:
$ sudo grubby --info=DEFAULT
index=1
kernel="/boot/vmlinuz-4.18.0-513.24.1.el8_9.x86_64"
args="ro resume=/dev/mapper/almalinux_skylake--alma-swap rd.lvm.lv=almalinux_skylake-alma/root rd.lvm.lv=almalinux_skylake-alma/swap rhgb quiet $tuned_params intel_iommu=on"
root="/dev/mapper/almalinux_skylake--alma-root"
initrd="/boot/initramfs-4.18.0-513.24.1.el8_9.x86_64.img $tuned_initrd"
title="AlmaLinux (4.18.0-513.24.1.el8_9.x86_64) 8.9 (Midnight Oncilla)"
id="5de40fe84eb94d6da4f1b208114750bb-4.18.0-513.24.1.el8_9.x86_64"
The HP Z4 G4 PC has a dual port serial card fitted:
$ lspci -nn | grep Serial
09:00.0 Serial controller [0700]: Nanjing Qinheng Microelectronics Co., Ltd. CH352/CH382 PCI/PCIe Dual Port Serial Adapter [1c00:3253] (rev 10)
Which looks to be using PCI-Express based Dual UARTs and printer port chip CH382
When AlmaLinux had booted, was able to use PuTTY to communicate using /dev/ttyS0
set to 115200 baud to a USB to serial adapter on a different Ubuntu PC.
With the Kernel which boots, used grub to edit the command line and replace rhgb quiet
with console=ttyS0,115200
. During boot:
EFI stub: UEFI Secure Boot is enabled.
still appeared on the monitor.- The rest of the boot messages appears on the serial port:
[ 0.000000] Linux version 4.18.0-513.24.1.el8_9.x86_64 (mockbuild@x64-builder02.almalinux.org) (gcc version 8.5.0 20210514 (Red Hat 8.5.0-20) (GCC)) #1 SMP Mon Apr 8 11:23:13 EDT 2024 [ 0.000000] Command line: BOOT_IMAGE=(hd1,gpt2)/vmlinuz-4.18.0-513.24.1.el8_9.x86_64 root=/dev/mapper/almalinux_skylake--alma-root ro resume=/dev/mapper/almalinux_skylake--alma-swap rd.lvm.lv=almalinux_skylake-alma/root rd.lvm.lv=almalinux_skylake-alma/swap console=ttyS0,115200 intel_iommu=on <<snip>> AlmaLinux 8.10 (Cerulean Leopard) Kernel 4.18.0-513.24.1.el8_9.x86_64 on an x86_64 skylake-alma login:
The ttyS0
device has IO port 0x20C0:
$ sudo cat /proc/tty/driver/serial
[sudo] password for mr_halfword:
serinfo:1.0 driver revision:
0: uart:XR16850 port:000020C0 irq:16 tx:21019 rx:0 RTS|CTS|DTR|DSR
1: uart:XR16850 port:000020C8 irq:16 tx:0 rx:0
2: uart:unknown port:000003E8 irq:4
3: uart:unknown port:000002E8 irq:3
Tried some other options to get the EFI stub: UEFI Secure Boot is enabled.
to appear on the serial console, to get more debug about when the problematic Kernel hangs. They attempts weren't successful:
- If add
initcall_debug console=ttyS0,115200
get additionalcalling
lines which start at time[0.282895]
, after 401 lines of previous output. - If add
console=ttyS0,115200 efi=debug
get additionalefi:
lines which start at time[0.000000]
, after 59 lines of previous output. - If add
earlyprintk=serial,ttyS0,115200
get no serial output. - If add
earlycon=uart8250,io,0x20c0,115200
get no serial output.dmesg
hadearlycon: uart8250 at I/O port 0x20c0 (options '115200')
so the option was recognised. - If add
console=uart8250,io,0x20c0,115200
:- Kernel doesn't boot
- The monitor only dsplays
EFI stub: UEFI Secure Boot is enabled.
- The only output on the serial port is some zero bytes. The answer to Is it possible to set serial speed for an early kernel boot log to a MMIO UART? suggests this may be related to a non-standard UART clock.
Installed further updates, including Kerrnel version 4.18.0-553.8.1.el8_10.x86_64
, and with that Kernel the PC boots. From scanning the release notes not obvious which change fixed the problem with the boot hanging.
The update removed intel_iommu=on
from the Linux command line as per
Custom Linux commandline options lost after Kernel update:
$ cat /proc/cmdline
BOOT_IMAGE=(hd1,gpt2)/vmlinuz-4.18.0-553.8.1.el8_10.x86_64 root=/dev/mapper/almalinux_skylake--alma-root ro resume=/dev/mapper/almalinux_skylake--alma-swap rd.lvm.lv=almalinux_skylake-alma/root rd.lvm.lv=almalinux_skylake-alma/swap rhgb quiet
Added the option back:
[mr_halfword@skylake-alma ~]$ sudo grubby --update-kernel=DEFAULT --args=intel_iommu=on
[mr_halfword@skylake-alma ~]$ sudo grubby --info=DEFAULT
index=0
kernel="/boot/vmlinuz-4.18.0-553.8.1.el8_10.x86_64"
args="ro resume=/dev/mapper/almalinux_skylake--alma-swap rd.lvm.lv=almalinux_skylake-alma/root rd.lvm.lv=almalinux_skylake-alma/swap rhgb quiet $tuned_params intel_iommu=on"
root="/dev/mapper/almalinux_skylake--alma-root"
initrd="/boot/initramfs-4.18.0-553.8.1.el8_10.x86_64.img $tuned_initrd"
title="AlmaLinux (4.18.0-553.8.1.el8_10.x86_64) 8.10 (Cerulean Leopard)"
id="5de40fe84eb94d6da4f1b208114750bb-4.18.0-553.8.1.el8_10.x86_64"
Rebooted after changed the command line. The IOMMU was on, secure boot enabled and the Kernel wasn't tainted:
[mr_halfword@skylake-alma ~]$ cat /proc/cmdline
BOOT_IMAGE=(hd1,gpt2)/vmlinuz-4.18.0-553.8.1.el8_10.x86_64 root=/dev/mapper/almalinux_skylake--alma-root ro resume=/dev/mapper/almalinux_skylake--alma-swap rd.lvm.lv=almalinux_skylake-alma/root rd.lvm.lv=almalinux_skylake-alma/swap rhgb quiet intel_iommu=on
[mr_halfword@skylake-alma ~]$ dmesg|grep lockdown
[ 0.000000] Kernel is locked down from EFI secure boot; see man kernel_lockdown.7
[ 9.852952] Lockdown: swapper/0: Hibernation is restricted; see man kernel_lockdown.7
[mr_halfword@skylake-alma ~]$ ~/fpga_sio/software_tests/eclipse_project/bind_xilinx_devices_to_vfio.sh
IOMMU devices present: dmar0 dmar1 dmar2 dmar3
Loading vfio-pci module
[sudo] password for mr_halfword:
Bound vfio-pci driver to 0000:15:00.0 10ee:7024 [0002:0003]
Waiting for /dev/vfio/41 to be created
Giving user permission to IOMMU group 41 for 0000:15:00.0 10ee:7024 [0002:0003]
Bound vfio-pci driver to 0000:30:00.0 10ee:7011 [0000:0000]
Waiting for /dev/vfio/89 to be created
Giving user permission to IOMMU group 89 for 0000:30:00.0 10ee:7011 [0000:0000]
[mr_halfword@skylake-alma ~]$ ~/Downloads/kernel-chktaint
Kernel not Tainted
Tests using the IOMMU were successful.