qemu-system-arm highmem support broken with TCG
Host environment
- Operating system: Debian 12
- OS/kernel version: linux-6.11
- Architecture: arm64
- QEMU flavor: qemu-system-arm
- QEMU version: 7.2 through 9.1
- QEMU command line: -machine type=virt,highmem=on,accel=tcg -cpu cortex-a15 -m 3080M
Emulated/Virtualized environment
- Operating system: Debian 12
- OS/kernel version: linux-6.11, LPAE, highmem (also tested linux-5.4)
- Architecture: arm
Booting Linux in a tcg guest with RAM above the 4GB line fails, hanging while loading /sbin/init, presumably this being the first page fault on a highmem page during boot. Output either ends after "Run /sbin/init as init process" or prints a watchdog warning depending on configuration, with output like
[ 21.987768] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
[ 21.987994] rcu: (detected by 0, t=2102 jiffies, g=-1115, q=6 ncpus=1)
[ 21.988118] rcu: All QSes seen, last rcu_sched kthread activity 2102 (-27802--29904), jiffies_till_next_fqs=1, root ->qsmask 0x0
[ 21.988358] rcu: rcu_sched kthread starved for 2102 jiffies! g-1115 f0x2 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=0
[ 21.988496] rcu: Unless rcu_sched kthread gets sufficient CPU time, OOM is now expected behavior.
[ 21.988622] rcu: RCU grace-period kthread stack dump:
[ 21.988739] task:rcu_sched state:R running task stack:0 pid:13 tgid:13 ppid:2 flags:0x00000000
[ 21.989006] Call trace:
[ 21.989340] __schedule from schedule+0x20/0xd0
[ 21.989816] schedule from schedule_timeout+0x1bc/0x2e4
[ 21.989912] schedule_timeout from rcu_gp_fqs_loop+0x128/0x51c
[ 21.989996] rcu_gp_fqs_loop from rcu_gp_kthread+0x154/0x1cc
[ 21.990092] rcu_gp_kthread from kthread+0xec/0x108
[ 21.990168] kthread from ret_from_fork+0x14/0x38
[ 21.990262] Exception stack(0xf0869fb0 to 0xf0869ff8)
[ 21.990374] 9fa0: 00000000 00000000 00000000 00000000
[ 21.990486] 9fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[ 21.990598] 9fe0: 00000000 00000000 00000000 00000000 00000013 00000000
[ 21.990744] rcu: Stack dump where RCU GP kthread last ran:
[ 21.990959] CPU: 0 UID: 0 PID: 1 Comm: init Not tainted 6.11.0-rc4-00002-g92a10d386149 #38
[ 21.991115] Hardware name: Generic DT based system
[ 21.991184] PC is at flush_tlb_page+0x90/0xe4
[ 21.991248] LR is at handle_mm_fault+0x310/0xe6c
[ 21.991319] pc : [<4020e750>] lr : [<40385088>] psr: 00000113
[ 21.991402] sp : f080dcf0 ip : 42128f00 fp : f080de18
[ 21.991475] r10: 004fc000 r9 : 00e00001 r8 : 7cc0975f
[ 21.991546] r7 : 00000000 r6 : 00000000 r5 : 42793c08 r4 : 41804e08
[ 21.991637] r3 : 004fc001 r2 : 90f00000 r1 : 004fc000 r0 : 90f00000
[ 21.991754] Flags: nzcv IRQs on FIQs on Mode SVC_32 ISA ARM Segment user
[ 21.991862] Control: 30c5387d Table: 43019100 DAC: fffffffd
[ 21.991957] Call trace:
[ 21.991990] flush_tlb_page from handle_mm_fault+0x310/0xe6c
[ 21.992106] handle_mm_fault from do_page_fault+0xf4/0x3e0
[ 21.992191] do_page_fault from do_DataAbort+0x30/0xa4
[ 21.992261] do_DataAbort from __dabt_svc+0x4c/0x80
[ 21.992327] Exception stack(0xf080de18 to 0xf080de60)
[ 21.992399] de00: 004fc040 00000fb8
[ 21.992511] de20: 00000000 b5003500 004fc1c8 004fc040 004fb000 00000003 43036080 b5003500
[ 21.992607] de40: 00000000 43036080 00000000 f080de68 4042c63c 40e2dae4 20000013 ffffffff
[ 21.992749] __dabt_svc from __clear_user_std+0x34/0x68
[ 21.992830] __clear_user_std from elf_load+0x1a8/0x204
[ 21.992917] elf_load from load_elf_binary+0x548/0x1398
[ 21.992988] load_elf_binary from bprm_execve+0x234/0x51c
[ 21.993063] bprm_execve from kernel_execve+0xf8/0x194
[ 21.993135] kernel_execve from try_to_run_init_process+0xc/0x38
[ 21.993213] try_to_run_init_process from kernel_init+0xdc/0x12c
[ 21.993298] kernel_init from ret_from_fork+0x14/0x38
[ 21.993370] Exception stack(0xf080dfb0 to 0xf080dff8)
[ 21.993450] dfa0: 00000000 00000000 00000000 00000000
[ 21.993563] dfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[ 21.993672] dfe0: 00000000 00000000 00000000 00000000 00000013 00000000
Using KVM acceleration on the same configuration works.
Additional information
I initially bisected this to commit 39a1fd25 ("target/arm: Fix handling of LPAE block descriptors"), which introduced an identical bug by masking the wrong address bits due to a type mismatch, but this was in turn fixed by commit c2360eaa ("target/arm: Fix qemu-system-arm handling of LPAE block descriptors for highmem"). The bug resurfaced between qemu-7.1.0 and qemu-7.2.0 after commit f3639a64 ("target/arm: Use softmmu tlbs for page table walking"), but may be caused by the preceding 4a358556 ("target/arm: Plumb debug into S1Translate") which fails to boot for an unrelated reason.
I reproduced this on qemu-7.2 as shipped by Debian as well as on qemu-9.1 (built locally).
Part of this problem appeared to be hidden by the 'highmem=on' argument not having the intended effect during parts of the bisection, which I worked around by overriding the 'pa_bits' variable in machvirt_init().