Breakpoints set at wrong addresses in `test-gdbstub.py` for some Linux kernels guest images

Host environment #1

  • Operating system: macOS 13.1
  • OS/kernel version: 22.2.0 Darwin Kernel Version 22.2.0: Fri Nov 11 02:06:26 PST 2022; root:xnu-8792.61.2~4/RELEASE_ARM64_T8112 arm64
  • Architecture: ARM (Apple M2)
  • QEMU flavor: qemu-system-aarch64
  • QEMU version: 7.2.50 (latest qemu.git master, commit 3b33ae48)
  • QEMU command line: see below

Host environment #2

  • Operating system: Debian bookworm
  • OS/kernel version: 6.0.0-6-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.0.12-1 (2022-12-09) x86_64 GNU/Linux
  • Architecture: x86
  • QEMU flavor: qemu-system-aarch64
  • QEMU version: 7.2.50 (latest qemu.git master, commit 3b33ae48)
  • QEMU command line: see below

Emulated/Virtualized environment

  • Operating system:
  • OS/kernel version: Linux 5.19 (among others)
  • Architecture: ARM

Description of problem

The script tests/guest-debug/test-gdbstub.py for testing QEMU's GDB stub on Linux kernel guests sets breakpoints on kernel_init() and wait_for_completion(). As the script is coded, breakpoints are set (implicitly) not at the functions' start addresses, but at the end of the functions' prologues.

For some Linux kernel builds in which kernel_init() and wait_for_completion() get compiled with a function prologue, the script fails to detect breakpoint hits in check_hbreak() and check_break() because it compares the stopped address (i.e. the end of the function's prologue) with the function's start address, and they differ. To observe the difference in GDB:

$ gdb -q --nx vmlinux
Reading symbols from vmlinux...
(gdb) b kernel_init
Breakpoint 1 at 0xffff800008fbeb28: file init/main.c, line 1497.    # <- prologue start
(gdb) b *kernel_init
Breakpoint 2 at 0xffff800008fbeb18: file init/main.c, line 1491.    # <- function start

In my tests, the issue doesn't occur with standard Linux kernels builds (e.g. compiled on Linux hosts with GCC) because typically both kernel_init() and wait_for_completion() seem to be without prologues.

Steps to reproduce

The issue has so far been encountered only with arm64 Linux kernel guests compiled on macOS arm64 with mac-linux-kdk.

  1. Compile a recent arm64 Linux kernel on macOS arm64 with debugging information (first make defconfig, then make menuconfig and set Kernel hacking / Compile-time checks and compiler options / Debug information / Rely on toolchain's implicit default DWARF version)

    $ file /tmp/linux-5.19/arch/arm64/boot/Image
    /tmp/linux-5.19/arch/arm64/boot/Image: Linux kernel ARM64 boot executable Image, little-endian, 4K pages
    $ file /tmp/linux-5.19/vmlinux
    /tmp/linux-5.19/vmlinux: ELF 64-bit LSB pie executable, ARM aarch64, version 1 (SYSV), statically linked, BuildID[sha1]=bf9e422d48e0aded5859fe34d6de2c174ef3a20b, with debug_info, not stripped
  2. Start QEMU waiting for GDB to connect:

    $ ./qemu-system-aarch64 -smp 1 -M virt -cpu cortex-a57 -kernel /tmp/linux-5.19/arch/arm64/boot/Image -append nokaslr -s -S
  3. Execute the test-gdbstub.py script (as described in the script file itself):

    $ gdb /tmp/linux-5.19/vmlinux -x tests/guest-debug/test-gdbstub.py

    The script then hangs.

Tested both on a macOS host and a Linux host.

Additional information

The proposed fix is to explicitly disable GDB's prologue decoder and set the two breakpoints at the functions' start addresses by adding an asterisk before the function name:

diff --git a/tests/guest-debug/test-gdbstub.py b/tests/guest-debug/test-gdbstub.py
index 98a5df4d4..6202d17c3 100644
--- a/tests/guest-debug/test-gdbstub.py
+++ b/tests/guest-debug/test-gdbstub.py
@@ -31,7 +31,7 @@ def check_step():
 def check_break(sym_name):
     "Setup breakpoint, continue and check we stopped."
     sym, ok = gdb.lookup_symbol(sym_name)
-    bp = gdb.Breakpoint(sym_name)
+    bp = gdb.Breakpoint("*%s" % (sym_name))

     gdb.execute("c")

@@ -48,7 +48,7 @@ def check_break(sym_name):
 def check_hbreak(sym_name):
     "Setup hardware breakpoint, continue and check we stopped."
     sym, ok = gdb.lookup_symbol(sym_name)
-    gdb.execute("hbreak %s" % (sym_name))
+    gdb.execute("hbreak *%s" % (sym_name))
     gdb.execute("c")

     # hopefully we came back

This change shouldn't impact the Linux kernel guests for which the script is already working as intended.