migrating failed with rcu_preempt message on proxmox 8
Host environment
- Operating system: Proxmox 8.1(Debian 12.2)
- OS/kernel version: Linux pve0.wx.xxxx.com 6.5.11-4-pve #1 SMP PREEMPT_DYNAMIC PMX 6.5.11-4 (2023-11-20T10:19Z) x86_64 GNU/Linux
- Architecture: x86_64
- QEMU flavor: pve-qemu-kvm
- QEMU version: 8.1.2-4
- QEMU command line:
- hardware: Dell R630, Xeon CPU E5-2686, H330 Raid
/usr/bin/kvm -id 105 -name tt1,debug-threads=on -no-shutdown -chardev socket,id=qmp,path=/var/run/qemu-server/105.qmp,server=on,wait=off -mon chardev=qmp,mode=control -chardev socket,id=qmp-event,path=/var/run/qmeventd.sock,reconnect=5 -mon chardev=qmp-event,mode=control -pidfile /var/run/qemu-server/105.pid -daemonize -smbios type=1,uuid=bee9ce8f-925a-4465-a937-a76f085da6cc -smp 2,sockets=1,cores=2,maxcpus=2 -nodefaults -boot menu=on,strict=on,reboot-timeout=1000,splash=/usr/share/qemu-server/bootsplash.jpg -vnc unix:/var/run/qemu-server/105.vnc,password=on -cpu qemu64,+aes,enforce,+kvm_pv_eoi,+kvm_pv_unhalt,+pni,+popcnt,+sse4.1,+sse4.2,+ssse3 -m 4096 -object iothread,id=iothread-virtioscsi0 -readconfig /usr/share/qemu-server/pve-q35-4.0.cfg -device vmgenid,guid=bcf1cae2-9532-4469-b8a2-607dd58eb067 -device usb-tablet,id=tablet,bus=ehci.0,port=1 -device VGA,id=vga,bus=pcie.0,addr=0x1 -chardev socket,path=/var/run/qemu-server/105.qga,server=on,wait=off,id=qga0 -device virtio-serial,id=qga0,bus=pci.0,addr=0x8 -device virtserialport,chardev=qga0,name=org.qemu.guest_agent.0 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3,free-page-reporting=on -iscsi initiator-name=iqn.1993-08.org.debian:01:bc8ee4b166bb -device virtio-scsi-pci,id=virtioscsi0,bus=pci.3,addr=0x1,iothread=iothread-virtioscsi0 -drive file=rbd:nvme/vm-105-disk-0:conf=/etc/pve/ceph.conf:id=admin:keyring=/etc/pve/priv/ceph/nvme.keyring,if=none,id=drive-scsi0,cache=writeback,format=raw,aio=io_uring,detect-zeroes=on -device scsi-hd,bus=virtioscsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0,id=scsi0,bootindex=100 -netdev type=tap,id=net0,ifname=tap105i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on -device virtio-net-pci,mac=BC:24:11:BE:A7:94,netdev=net0,bus=pci.0,addr=0x12,id=net0,rx_queue_size=1024,tx_queue_size=256,bootindex=101 -machine type=pc-q35-8.0+pve0 -incoming unix:/run/qemu-server/105.migrate -S
Emulated/Virtualized environment
- Operating system: debian 12
- OS/kernel version: Linux debian 6.1.0-13-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.55-1 (2023-09-29) x86_64 GNU/Linux
- Architecture: x86_64
Description of problem
when i migrate the VM from one host to another, it fails and give messages:
[ 584.109502] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
[ 584.109534] rcu: 1-...!: (0 ticks this GP) idle=1408/0/0x0 softirq=8428/8428 fqs=0 (false positive?)
[ 584.109556] (detected by 0, t=5252 jiffies, g=2953, q=74 ncpus=2)
[ 584.109561] Sending NMI from CPU 0 to CPUs 1:
[ 584.109587] NMI backtrace for cpu 1 skipped: idling at native_safe_halt+0xb/0x10
[ 584.110564] rcu: rcu_preempt kthread timer wakeup didn't happen for 5251 jiffies! g2953 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402
[ 584.110585] rcu: Possible timer handling issue on cpu=1 timer-softirq=8006
[ 584.110597] rcu: rcu_preempt kthread starved for 5252 jiffies! g2953 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=1
[ 584.110614] rcu: Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.
[ 584.110645] rcu: RCU grace-period kthread stack dump:
[ 584.110658] task:rcu_preempt state:I stack:0 pid:15 ppid:2 flags:0x00004000
[ 584.110667] Call Trace:
[ 584.110672] <TASK>
[ 584.110688] __schedule+0x351/0xa20
[ 584.110699] ? rcu_gp_cleanup+0x480/0x480
[ 584.110704] schedule+0x5d/0xe0
[ 584.110705] schedule_timeout+0x94/0x150
[ 584.110709] ? __bpf_trace_tick_stop+0x10/0x10
[ 584.110714] rcu_gp_fqs_loop+0x141/0x4c0
[ 584.110717] rcu_gp_kthread+0xd0/0x190
[ 584.110720] kthread+0xe9/0x110
[ 584.110725] ? kthread_complete_and_exit+0x20/0x20
[ 584.110728] ret_from_fork+0x22/0x30
[ 584.110735] </TASK>
[ 584.110736] rcu: Stack dump where RCU GP kthread last ran:
[ 584.110747] Sending NMI from CPU 0 to CPUs 1:
[ 584.110757] NMI backtrace for cpu 1 skipped: idling at native_safe_halt+0xb/0x10
we can reproduce on our R630 cluster easily, but it is OK on R730 cluster and R740 cluster.
Steps to reproduce
- create and run an VM
- migrate the vm to other host
- it failed with message
Additional information
i downgrade the pve-qemu-kvm from 8.1.2-4 to 8.0.2-3, same problem.
Edited by Peng Yong