qemu-system-arm highmem support broken with TCG

Host environment

  • Operating system: Debian 12
  • OS/kernel version: linux-6.11
  • Architecture: arm64
  • QEMU flavor: qemu-system-arm
  • QEMU version: 7.2 through 9.1
  • QEMU command line: -machine type=virt,highmem=on,accel=tcg -cpu cortex-a15 -m 3080M

Emulated/Virtualized environment

  • Operating system: Debian 12
  • OS/kernel version: linux-6.11, LPAE, highmem (also tested linux-5.4)
  • Architecture: arm

Booting Linux in a tcg guest with RAM above the 4GB line fails, hanging while loading /sbin/init, presumably this being the first page fault on a highmem page during boot. Output either ends after "Run /sbin/init as init process" or prints a watchdog warning depending on configuration, with output like

[   21.987768] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
[   21.987994] rcu: 	(detected by 0, t=2102 jiffies, g=-1115, q=6 ncpus=1)
[   21.988118] rcu: All QSes seen, last rcu_sched kthread activity 2102 (-27802--29904), jiffies_till_next_fqs=1, root ->qsmask 0x0
[   21.988358] rcu: rcu_sched kthread starved for 2102 jiffies! g-1115 f0x2 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=0
[   21.988496] rcu: 	Unless rcu_sched kthread gets sufficient CPU time, OOM is now expected behavior.
[   21.988622] rcu: RCU grace-period kthread stack dump:
[   21.988739] task:rcu_sched       state:R  running task     stack:0     pid:13    tgid:13    ppid:2      flags:0x00000000
[   21.989006] Call trace: 
[   21.989340]  __schedule from schedule+0x20/0xd0
[   21.989816]  schedule from schedule_timeout+0x1bc/0x2e4
[   21.989912]  schedule_timeout from rcu_gp_fqs_loop+0x128/0x51c
[   21.989996]  rcu_gp_fqs_loop from rcu_gp_kthread+0x154/0x1cc
[   21.990092]  rcu_gp_kthread from kthread+0xec/0x108
[   21.990168]  kthread from ret_from_fork+0x14/0x38
[   21.990262] Exception stack(0xf0869fb0 to 0xf0869ff8)
[   21.990374] 9fa0:                                     00000000 00000000 00000000 00000000
[   21.990486] 9fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[   21.990598] 9fe0: 00000000 00000000 00000000 00000000 00000013 00000000
[   21.990744] rcu: Stack dump where RCU GP kthread last ran:
[   21.990959] CPU: 0 UID: 0 PID: 1 Comm: init Not tainted 6.11.0-rc4-00002-g92a10d386149 #38
[   21.991115] Hardware name: Generic DT based system
[   21.991184] PC is at flush_tlb_page+0x90/0xe4
[   21.991248] LR is at handle_mm_fault+0x310/0xe6c
[   21.991319] pc : [<4020e750>]    lr : [<40385088>]    psr: 00000113
[   21.991402] sp : f080dcf0  ip : 42128f00  fp : f080de18
[   21.991475] r10: 004fc000  r9 : 00e00001  r8 : 7cc0975f
[   21.991546] r7 : 00000000  r6 : 00000000  r5 : 42793c08  r4 : 41804e08
[   21.991637] r3 : 004fc001  r2 : 90f00000  r1 : 004fc000  r0 : 90f00000
[   21.991754] Flags: nzcv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
[   21.991862] Control: 30c5387d  Table: 43019100  DAC: fffffffd
[   21.991957] Call trace: 
[   21.991990]  flush_tlb_page from handle_mm_fault+0x310/0xe6c
[   21.992106]  handle_mm_fault from do_page_fault+0xf4/0x3e0
[   21.992191]  do_page_fault from do_DataAbort+0x30/0xa4
[   21.992261]  do_DataAbort from __dabt_svc+0x4c/0x80
[   21.992327] Exception stack(0xf080de18 to 0xf080de60)
[   21.992399] de00:                                                       004fc040 00000fb8
[   21.992511] de20: 00000000 b5003500 004fc1c8 004fc040 004fb000 00000003 43036080 b5003500
[   21.992607] de40: 00000000 43036080 00000000 f080de68 4042c63c 40e2dae4 20000013 ffffffff
[   21.992749]  __dabt_svc from __clear_user_std+0x34/0x68
[   21.992830]  __clear_user_std from elf_load+0x1a8/0x204
[   21.992917]  elf_load from load_elf_binary+0x548/0x1398
[   21.992988]  load_elf_binary from bprm_execve+0x234/0x51c
[   21.993063]  bprm_execve from kernel_execve+0xf8/0x194
[   21.993135]  kernel_execve from try_to_run_init_process+0xc/0x38
[   21.993213]  try_to_run_init_process from kernel_init+0xdc/0x12c
[   21.993298]  kernel_init from ret_from_fork+0x14/0x38
[   21.993370] Exception stack(0xf080dfb0 to 0xf080dff8)
[   21.993450] dfa0:                                     00000000 00000000 00000000 00000000
[   21.993563] dfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[   21.993672] dfe0: 00000000 00000000 00000000 00000000 00000013 00000000

Using KVM acceleration on the same configuration works.

Additional information

I initially bisected this to commit 39a1fd25 ("target/arm: Fix handling of LPAE block descriptors"), which introduced an identical bug by masking the wrong address bits due to a type mismatch, but this was in turn fixed by commit c2360eaa ("target/arm: Fix qemu-system-arm handling of LPAE block descriptors for highmem"). The bug resurfaced between qemu-7.1.0 and qemu-7.2.0 after commit f3639a64 ("target/arm: Use softmmu tlbs for page table walking"), but may be caused by the preceding 4a358556 ("target/arm: Plumb debug into S1Translate") which fails to boot for an unrelated reason.

I reproduced this on qemu-7.2 as shipped by Debian as well as on qemu-9.1 (built locally).

Part of this problem appeared to be hidden by the 'highmem=on' argument not having the intended effect during parts of the bisection, which I worked around by overriding the 'pa_bits' variable in machvirt_init().

Edited by Arnd Bergmann