Breakpoints set at wrong addresses in `test-gdbstub.py` for some Linux kernels guest images
Host environment #1
- Operating system: macOS 13.1
- OS/kernel version: 22.2.0 Darwin Kernel Version 22.2.0: Fri Nov 11 02:06:26 PST 2022; root:xnu-8792.61.2~4/RELEASE_ARM64_T8112 arm64
- Architecture: ARM (Apple M2)
- QEMU flavor: qemu-system-aarch64
- QEMU version: 7.2.50 (latest qemu.git master, commit 3b33ae48)
- QEMU command line: see below
Host environment #2
- Operating system: Debian bookworm
- OS/kernel version: 6.0.0-6-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.0.12-1 (2022-12-09) x86_64 GNU/Linux
- Architecture: x86
- QEMU flavor: qemu-system-aarch64
- QEMU version: 7.2.50 (latest qemu.git master, commit 3b33ae48)
- QEMU command line: see below
Emulated/Virtualized environment
- Operating system:
- OS/kernel version: Linux 5.19 (among others)
- Architecture: ARM
Description of problem
The script tests/guest-debug/test-gdbstub.py for testing QEMU's GDB
stub on Linux kernel guests sets breakpoints on kernel_init() and
wait_for_completion(). As the script is coded, breakpoints are set
(implicitly) not at the functions' start addresses, but at the end of
the functions' prologues.
For some Linux kernel builds in which kernel_init() and
wait_for_completion() get compiled with a function prologue, the
script fails to detect breakpoint hits in check_hbreak() and
check_break() because it compares the stopped address (i.e. the end of
the function's prologue) with the function's start address, and they
differ. To observe the difference in GDB:
$ gdb -q --nx vmlinux
Reading symbols from vmlinux...
(gdb) b kernel_init
Breakpoint 1 at 0xffff800008fbeb28: file init/main.c, line 1497. # <- prologue start
(gdb) b *kernel_init
Breakpoint 2 at 0xffff800008fbeb18: file init/main.c, line 1491. # <- function start
In my tests, the issue doesn't occur with standard Linux kernels builds
(e.g. compiled on Linux hosts with GCC) because typically both
kernel_init() and wait_for_completion() seem to be without
prologues.
Steps to reproduce
The issue has so far been encountered only with arm64 Linux kernel guests compiled on macOS arm64 with mac-linux-kdk.
-
Compile a recent arm64 Linux kernel on macOS arm64 with debugging information (first
make defconfig, thenmake menuconfigand setKernel hacking / Compile-time checks and compiler options / Debug information / Rely on toolchain's implicit default DWARF version)$ file /tmp/linux-5.19/arch/arm64/boot/Image /tmp/linux-5.19/arch/arm64/boot/Image: Linux kernel ARM64 boot executable Image, little-endian, 4K pages $ file /tmp/linux-5.19/vmlinux /tmp/linux-5.19/vmlinux: ELF 64-bit LSB pie executable, ARM aarch64, version 1 (SYSV), statically linked, BuildID[sha1]=bf9e422d48e0aded5859fe34d6de2c174ef3a20b, with debug_info, not stripped -
Start QEMU waiting for GDB to connect:
$ ./qemu-system-aarch64 -smp 1 -M virt -cpu cortex-a57 -kernel /tmp/linux-5.19/arch/arm64/boot/Image -append nokaslr -s -S -
Execute the
test-gdbstub.pyscript (as described in the script file itself):$ gdb /tmp/linux-5.19/vmlinux -x tests/guest-debug/test-gdbstub.pyThe script then hangs.
Tested both on a macOS host and a Linux host.
Additional information
The proposed fix is to explicitly disable GDB's prologue decoder and set the two breakpoints at the functions' start addresses by adding an asterisk before the function name:
diff --git a/tests/guest-debug/test-gdbstub.py b/tests/guest-debug/test-gdbstub.py
index 98a5df4d4..6202d17c3 100644
--- a/tests/guest-debug/test-gdbstub.py
+++ b/tests/guest-debug/test-gdbstub.py
@@ -31,7 +31,7 @@ def check_step():
def check_break(sym_name):
"Setup breakpoint, continue and check we stopped."
sym, ok = gdb.lookup_symbol(sym_name)
- bp = gdb.Breakpoint(sym_name)
+ bp = gdb.Breakpoint("*%s" % (sym_name))
gdb.execute("c")
@@ -48,7 +48,7 @@ def check_break(sym_name):
def check_hbreak(sym_name):
"Setup hardware breakpoint, continue and check we stopped."
sym, ok = gdb.lookup_symbol(sym_name)
- gdb.execute("hbreak %s" % (sym_name))
+ gdb.execute("hbreak *%s" % (sym_name))
gdb.execute("c")
# hopefully we came back
This change shouldn't impact the Linux kernel guests for which the script is already working as intended.