Skip to content

powerpc: fix kernel panic on boot of PowerVM systems that are running on shared processing mode [Hash]

Desnes Nunes requested to merge desnesn/centos-stream-9:rh2055566 into main

BUGZILLA

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2055566

UPSTREAM STATUS

Upstream Status: Patches have been accepted on kernel/git/powerpc/linux.git

CONFLICTS

None

BUILD INFORMATION

Build Info: http://brewweb.engineering.redhat.com/brew/taskinfo?taskID=43154814

TESTING

A Fleetwood in the given configuration booted normally with a patched kernel:

=======================================
Power 9 - Fleetwood
--- Shared Processing Mode (15 Procunits, 128 Virtual Procs, 21GB)
System  : ltcfleet2
LPAR    : ltcfleet2-lp17

Distro  : Red Hat Enterprise Linux 9.0 Beta (Plow)

cpu             : POWER9 (architected), altivec supported
clock   : 3150.000000MHz
revision        : 1.2 (pvr 004e 2102)
timebase        : 512000000
platform        : pSeries
model   : IBM,9080-M9S
machine : CHRP IBM,9080-M9S
MMU     : Hash

Interrupts      : XIVE
DUMP Config     : kdump

--- lsmcode ---
Version of System Firmware is FW950.10 (VH950_072) (t) FW950.10 (VH950_072) (p) FW950.10 (VH950_072) (b)
Version of PFW is 21212021020481CF0681

--- Kernel version ---
5.14.0-39.test.el9.ppc64le

--- cat /proc/cmdline ---
BOOT_IMAGE=(ieee1275//vdevice/vfc-client@300008a9/disk@500507680210a422,msdos2)/vmlinuz-5.14.0-39.test.el9.ppc64le root=/dev/mapper/rhel_ltcfleet2--lp17-root ro crashkernel=2G-4G:384M,4G-16G:512M,16G-64G:1G,64G-128G:2G,128G-102400T:4G rd.lvm.lv=rhel_ltcfleet2-lp17/root rd.lvm.lv=rhel_ltcfleet2-lp17/swap biosdevname=0

--- cat /etc/*-release ---
NAME="Red Hat Enterprise Linux"
VERSION="9.0 (Plow)"
ID="rhel"
ID_LIKE="fedora"
VERSION_ID="9.0"
PLATFORM_ID="platform:el9"
PRETTY_NAME="Red Hat Enterprise Linux 9.0 Beta (Plow)"
ANSI_COLOR="0;31"
LOGO="fedora-logo-icon"
CPE_NAME="cpe:/o:redhat:enterprise_linux:9::baseos"
HOME_URL="https://www.redhat.com/"
DOCUMENTATION_URL="https://access.redhat.com/documentation/red_hat_enterprise_linux/9/"
BUG_REPORT_URL="https://bugzilla.redhat.com/"
REDHAT_BUGZILLA_PRODUCT="Red Hat Enterprise Linux 9"
REDHAT_BUGZILLA_PRODUCT_VERSION=9.0
REDHAT_SUPPORT_PRODUCT="Red Hat Enterprise Linux"
REDHAT_SUPPORT_PRODUCT_VERSION="9.0 Beta"
Red Hat Enterprise Linux release 9.0 Beta (Plow)

Config :
================
--- Processors ---
curr_proc_mode=shared
curr_min_proc_units=0.05, curr_proc_units=15.0, curr_max_proc_units=80.0
curr_min_procs=1, curr_procs=128, curr_max_procs=192
curr_sharing_mode=cap

--- Memory ---
curr_mem=21.2GB

DESCRIPTION

Even though RHEL-9.0 installation was successful on Fleetwood lpar systems using maximum system configurations, the boot was crashing with the following kernel panic running in a loop:

[    0.000000] -----------------------------------------------------
[    0.000000] Kernel panic - not syncing: Failed to allocate memory for MCE event data
[    0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 5.14.0-48.el9.ppc64le #1
[    0.000000] Call Trace:
[    0.000000] [c000000002a43d10] [c00000000080ed80] dump_stack_lvl+0x74/0xa8 (unreliable)
[    0.000000] [c000000002a43d50] [c00000000015a648] panic+0x174/0x40c
[    0.000000] [c000000002a43df0] [c00000000200ced4] mce_init+0xc8/0x108
[    0.000000] [c000000002a43e80] [c00000000200a604] setup_arch+0x360/0x3e0
[    0.000000] [c000000002a43f00] [c000000002004c4c] start_kernel+0xac/0x664
[    0.000000] [c000000002a43f90] [c00000000000d39c] start_here_common+0x1c/0x600
[    0.000000] ------------[ cut here ]------------
[    0.000000] WARNING: CPU: 0 PID: 0 at drivers/tty/vt/vt.c:4377 do_unblank_screen+0x58/0x230
[    0.000000] Modules linked in:
[    0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 5.14.0-48.el9.ppc64le #1
[    0.000000] NIP:  c00000000091c018 LR: c00000000091c134 CTR: c0000000001fd8e0
[    0.000000] REGS: c000000002a43a30 TRAP: 0700   Not tainted  (5.14.0-48.el9.ppc64le)
[    0.000000] MSR:  8000000002021033 <SF,VEC,ME,IR,DR,RI,LE>  CR: 28000842  XER: 20040001
[    0.000000] CFAR: c00000000091c148 IRQMASK: 3
[    0.000000] GPR00: c00000000091c134 c000000002a43cd0 c000000002a46a00 0000000000000000
[    0.000000] GPR04: 0000000028000244 0000000000000000 0000000020040001 0000000000000000
[    0.000000] GPR08: ffffffffffffd954 0000000000000001 0000000000000000 0000000002001001
[    0.000000] GPR12: c0000000001fd8e0 c000000002dd0000 0000000000000000 0000000000000000
[    0.000000] GPR16: 0000000000000000 0000000000000000 000000000f693208 0000000000000004
[    0.000000] GPR20: 000000000f681f10 0000000000000004 00000000089dfcc4 0000000000000001
[    0.000000] GPR24: 0000000000000000 c000000000000000 c00000000255d9b0 c000000002b39650
[    0.000000] GPR28: c000000002a878a0 0000000000000000 c000000001156a20 c000000002c47a98
[    0.000000] NIP [c00000000091c018] do_unblank_screen+0x58/0x230
[    0.000000] LR [c00000000091c134] do_unblank_screen+0x174/0x230
[    0.000000] Call Trace:
[    0.000000] [c000000002a43cd0] [c00000000091c154] do_unblank_screen+0x194/0x230 (unreliable)
[    0.000000] [c000000002a43d50] [c00000000015a6b8] panic+0x1e4/0x40c
[    0.000000] [c000000002a43df0] [c00000000200ced4] mce_init+0xc8/0x108
[    0.000000] [c000000002a43e80] [c00000000200a604] setup_arch+0x360/0x3e0
[    0.000000] [c000000002a43f00] [c000000002004c4c] start_kernel+0xac/0x664
[    0.000000] [c000000002a43f90] [c00000000000d39c] start_here_common+0x1c/0x600
[    0.000000] Instruction dump:
[    0.000000] 3be91098 f8010010 f821ff81 813f0000 2c090000 41820154 3d220004 39290d84
[    0.000000] 81290000 2c090000 39200000 4182011c <0b090000> 3d220035 39400000 3fc20035
[    0.000000] random: get_random_bytes called from __warn+0x134/0x190 with crng_init=0
[    0.000000] ---[ end trace 0000000000000000 ]---
[    0.000000] Rebooting in 180 seconds..

This bug is happening due to insufficient memory to essensial systems resources in the first memblock, caused by the current memory offset of the crash kernel. In summary, since RMA region now can be 512MB or more, the crash kernel offset needs to be u pdated on LPAR platforms.

Signed-off-by: Desnes A. Nunes do Rosario drosario@redhat.com

Merge request reports