It fails to boot TD guest with " -overcommit cpu-pm=on " in qemu command, which is needed by MWAIT

Test Environment

Platform: GNR-AP
Host/Guest: CentOS Stream release 9
Host/Guest kernel version: kernel-5.14.0-571.tdx.el9s
Architecture: x86
QEMU flavor: qemu-system-x86_64(/usr/libexec/qemu-kvm)
QEMU version: qemu-kvm-9.1.0-12.el9s.tdx.x86_64

Bug description

QEMU command line:

img=/home/rhel-guest-image-9.7-20250316.2.x86_64.qcow2
/usr/libexec/qemu-kvm \
        -name tdxvm,process=tdxvm,debug-threads=on \
        -accel kvm \
        -object tdx-guest,id=tdx \
        -smp 4 \
        -m 4G \
        -cpu host \
        -overcommit cpu-pm=on \
        -nodefaults -nographic \
        -bios OVMF.fd \
        -vga none \
        -machine q35,kernel_irqchip=split,confidential-guest-support=tdx,hpet=off \
        -drive file=$img,if=none,id=virtio-disk0 \
        -device virtio-blk-pci,drive=virtio-disk0 \
        -serial stdio

*Error log

[ 68.250659] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks: [ 68.251591] rcu: 22-...!: (14 GPs behind) idle=51a8/0/0x0 softirq=283/284 fqs=1 (false positive?) [ 68.252913] rcu: 23-...!: (14 GPs behind) idle=ad78/0/0x0 softirq=111/111 fqs=1 (false positive?) [ 68.254263] rcu: 25-...!: (14 GPs behind) idle=5f88/0/0x0 softirq=35/35 fqs=1 (false positive?) [ 68.255471] rcu: 53-...!: (220 GPs behind) idle=6700/0/0x0 softirq=13/14 fqs=1 (false positive?) [ 68.256600] rcu: 55-...!: (12 GPs behind) idle=65b8/0/0x0 softirq=12/12 fqs=1 (false positive?) [ 68.257776] rcu: 56-...!: (10 GPs behind) idle=6758/0/0x0 softirq=13/13 fqs=1 (false positive?) [ 68.259080] rcu: 59-...!: (12 GPs behind) idle=74e8/0/0x0 softirq=12/12 fqs=1 (false positive?) [ 68.260391] rcu: 61-...!: (11 GPs behind) idle=68e8/0/0x0 softirq=16/16 fqs=1 (false positive?) [ 68.261718] rcu: 62-...!: (220 GPs behind) idle=68f0/0/0x0 softirq=13/14 fqs=1 (false positive?) [ 68.263070] rcu: (detected by 11, t=60015 jiffies, g=-223, q=7025 ncpus=64) [ 68.264132] Sending NMI from CPU 11 to CPUs 22: [ 68.264151] NMI backtrace for cpu 22 skipped: idling at cpu_idle_poll.isra.0+0x23/0xf0 [ 68.265141] Sending NMI from CPU 11 to CPUs 23: [ 68.265163] NMI backtrace for cpu 23 skipped: idling at cpu_idle_poll.isra.0+0x23/0xf0 [ 68.266150] Sending NMI from CPU 11 to CPUs 25: [ 68.266167] NMI backtrace for cpu 25 skipped: idling at cpu_idle_poll.isra.0+0x23/0xf0 [ 68.267158] Sending NMI from CPU 11 to CPUs 53: [ 68.267175] NMI backtrace for cpu 53 [ 68.267178] CPU: 53 UID: 0 PID: 0 Comm: swapper/53 Not tainted 6.14.0-rc3+ #6 (closed) [ 68.267180] Hardware name: Red Hat KVM/RHEL, BIOS edk2-20241117-1.el9 11/17/2024 [ 68.267181] RIP: 0010:tick_check_broadcast_expired+0x19/0x20 [ 68.267185] Code: cc 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 65 48 63 15 24 18 6f 7a 48 8b 05 09 ac 44 01 48 0f a3 10 0f 92 c0 0f b6 c0 cc cc cc cc cc cc 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 [ 68.267186] RSP: 0018:ff8a5524c026ff00 EFLAGS: 00000247 [ 68.267188] RAX: 0000000000000001 RBX: ff4fab1f00c1a080 RCX: 000000000000001f [ 68.267189] RDX: 0000000000000035 RSI: 00000000435e50d7 RDI: 0000000000016694 [ 68.267190] RBP: ff4fab1f00c1a080 R08: 00000001f05320a3 R09: 00000000fa83b2da [ 68.267191] R10: ff4fab2e3f6a4f00 R11: 0000000000cfe70e R12: 0000000000000000 [ 68.267191] R13: 0000000000000001 R14: 0000000000000100 R15: 0000000000000000 [ 68.267192] FS: 0000000000000000(0000) GS:ff4fab2e3f680000(0000) knlGS:0000000000000000 [ 68.267193] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 68.267194] CR2: 00007fa3272d6778 CR3: 0000000022e22001 CR4: 0000000000771ef0 [ 68.267195] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 68.267196] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400 [ 68.267197] PKRU: 55555554 [ 68.267197] Call Trace: [ 68.267199] [ 68.267201] ? nmi_cpu_backtrace+0x83/0xf0 [ 68.267205] ? nmi_cpu_backtrace_handler+0xd/0x20 [ 68.267210] ? nmi_handle+0x5b/0x150 [ 68.267214] ? default_do_nmi+0x40/0x100 [ 68.267216] ? exc_nmi+0xff/0x180 [ 68.267218] ? end_repeat_nmi+0xf/0x53 [ 68.267222] ? tick_check_broadcast_expired+0x19/0x20 [ 68.267223] ? tick_check_broadcast_expired+0x19/0x20 [ 68.267224] ? tick_check_broadcast_expired+0x19/0x20 [ 68.267225] [ 68.267225] [ 68.267226] cpu_idle_poll.isra.0+0x39/0xf0 [ 68.267227] do_idle+0x3b/0xd0 [ 68.267230] cpu_startup_entry+0x25/0x30 [ 68.267231] start_secondary+0x115/0x140 [ 68.267234] common_startup_64+0x13e/0x141 [ 68.267237] [ 68.268166] Sending NMI from CPU 11 to CPUs 55: [ 68.268184] NMI backtrace for cpu 55 skipped: idling at cpu_idle_poll.isra.0+0x23/0xf0 [ 68.269174] Sending NMI from CPU 11 to CPUs 56: [ 68.269193] NMI backtrace for cpu 56 skipped: idling at cpu_idle_poll.isra.0+0x23/0xf0 [ 68.270182] Sending NMI from CPU 11 to CPUs 59: [ 68.270202] NMI backtrace for cpu 59 skipped: idling at cpu_idle_poll.isra.0+0x23/0xf0 [ 68.271191] Sending NMI from CPU 11 to CPUs 61: [ 68.271208] NMI backtrace for cpu 61 skipped: idling at cpu_idle_poll.isra.0+0x23/0xf0 [ 68.272199] Sending NMI from CPU 11 to CPUs 62: [ 68.272217] NMI backtrace for cpu 62 [ 68.272220] CPU: 62 UID: 0 PID: 0 Comm: swapper/62 Not tainted 6.14.0-rc3+ #6 (closed) [ 68.272222] Hardware name: Red Hat KVM/RHEL, BIOS edk2-20241117-1.el9 11/17/2024 [ 68.272223] RIP: 0010:tick_check_broadcast_expired+0x0/0x20 [ 68.272227] Code: 90 90 90 90 90 90 90 90 90 90 48 8b 05 b9 f5 f2 01 c3 cc cc cc cc cc cc cc cc 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 <65> 48 63 15 24 18 6f 7a 48 8b 05 09 ac 44 01 48 0f a3 10 0f 92 c0 [ 68.272228] RSP: 0018:ff8a5524c02b7f00 EFLAGS: 00000246 [ 68.272229] RAX: 0000000000000000 RBX: ff4fab1f00c3c100 RCX: 000000000000001f [ 68.272230] RDX: 000000000000003e RSI: 00000000435e50d7 RDI: 00000000000168ec [ 68.272231] RBP: ff4fab1f00c3c100 R08: 00000001f3393605 R09: 00000000fa83b2da [ 68.272232] R10: ff4fab2e3fb24f00 R11: 00000000006369a3 R12: 0000000000000000 [ 68.272232] R13: 0000000000000001 R14: 0000000000000100 R15: 0000000000000000 [ 68.272233] FS: 0000000000000000(0000) GS:ff4fab2e3fb00000(0000) knlGS:0000000000000000 [ 68.272234] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 68.272235] CR2: 00007f681f43e778 CR3: 0000000022e22001 CR4: 0000000000771ef0 [ 68.272236] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 68.272236] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400 [ 68.272237] PKRU: 55555554 [ 68.272238] Call Trace: [ 68.272239] [ 68.272241] ? nmi_cpu_backtrace+0x83/0xf0 [ 68.272244] ? nmi_cpu_backtrace_handler+0xd/0x20 [ 68.272247] ? nmi_handle+0x5b/0x150 [ 68.272250] ? default_do_nmi+0x40/0x100 [ 68.272252] ? exc_nmi+0xff/0x180 [ 68.272253] ? end_repeat_nmi+0xf/0x53 [ 68.272256] ? __pfx_tick_check_broadcast_expired+0x10/0x10 [ 68.272257] ? __pfx_tick_check_broadcast_expired+0x10/0x10 [ 68.272258] ? __pfx_tick_check_broadcast_expired+0x10/0x10 [ 68.272259] [ 68.272260] [ 68.272260] cpu_idle_poll.isra.0+0x39/0xf0 [ 68.272262] do_idle+0x3b/0xd0 [ 68.272264] cpu_startup_entry+0x25/0x30 [ 68.272265] start_secondary+0x115/0x140 [ 68.272267] common_startup_64+0x13e/0x141 [ 68.272269] [ 68.273208] rcu: rcu_preempt kthread timer wakeup didn't happen for 60017 jiffies! g-223 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 [ 68.346950] rcu: Possible timer handling issue on cpu=23 timer-softirq=98 [ 68.347933] rcu: rcu_preempt kthread starved for 60095 jiffies! g-223 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=23 [ 68.349421] rcu: Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior. [ 68.350788] rcu: RCU grace-period kthread stack dump: [ 68.351562] task:rcu_preempt state:I stack:0 pid:18 tgid:18 ppid:2 task_flags:0x208040 flags:0x00004000 [ 68.353200] Call Trace: [ 68.353541] [ 68.353845] __schedule+0x26f/0x540 [ 68.354393] ? __pfx_rcu_gp_kthread+0x10/0x10 [ 68.355086] schedule+0x23/0xa0 [ 68.355587] schedule_timeout+0x73/0xf0 [ 68.356209] ? __pfx_process_timeout+0x10/0x10 [ 68.356909] rcu_gp_fqs_loop+0x10b/0x500 [ 68.357512] rcu_gp_kthread+0x13f/0x1d0 [ 68.358107] kthread+0xeb/0x230 [ 68.358594] ? __pfx_kthread+0x10/0x10 [ 68.359186] ? __pfx_kthread+0x10/0x10 [ 68.359760] ret_from_fork+0x2d/0x50 [ 68.360319] ? __pfx_kthread+0x10/0x10 [ 68.360933] ret_from_fork_asm+0x1a/0x30 [ 68.361535]

Edited Mar 27, 2025 by fanchen2