migrating failed with rcu_preempt message on proxmox 8

Host environment

  • Operating system: Proxmox 8.1(Debian 12.2)
  • OS/kernel version: Linux pve0.wx.xxxx.com 6.5.11-4-pve #1 SMP PREEMPT_DYNAMIC PMX 6.5.11-4 (2023-11-20T10:19Z) x86_64 GNU/Linux
  • Architecture: x86_64
  • QEMU flavor: pve-qemu-kvm
  • QEMU version: 8.1.2-4
  • QEMU command line:
  • hardware: Dell R630, Xeon CPU E5-2686, H330 Raid
/usr/bin/kvm -id 105 -name tt1,debug-threads=on -no-shutdown -chardev socket,id=qmp,path=/var/run/qemu-server/105.qmp,server=on,wait=off -mon chardev=qmp,mode=control -chardev socket,id=qmp-event,path=/var/run/qmeventd.sock,reconnect=5 -mon chardev=qmp-event,mode=control -pidfile /var/run/qemu-server/105.pid -daemonize -smbios type=1,uuid=bee9ce8f-925a-4465-a937-a76f085da6cc -smp 2,sockets=1,cores=2,maxcpus=2 -nodefaults -boot menu=on,strict=on,reboot-timeout=1000,splash=/usr/share/qemu-server/bootsplash.jpg -vnc unix:/var/run/qemu-server/105.vnc,password=on -cpu qemu64,+aes,enforce,+kvm_pv_eoi,+kvm_pv_unhalt,+pni,+popcnt,+sse4.1,+sse4.2,+ssse3 -m 4096 -object iothread,id=iothread-virtioscsi0 -readconfig /usr/share/qemu-server/pve-q35-4.0.cfg -device vmgenid,guid=bcf1cae2-9532-4469-b8a2-607dd58eb067 -device usb-tablet,id=tablet,bus=ehci.0,port=1 -device VGA,id=vga,bus=pcie.0,addr=0x1 -chardev socket,path=/var/run/qemu-server/105.qga,server=on,wait=off,id=qga0 -device virtio-serial,id=qga0,bus=pci.0,addr=0x8 -device virtserialport,chardev=qga0,name=org.qemu.guest_agent.0 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3,free-page-reporting=on -iscsi initiator-name=iqn.1993-08.org.debian:01:bc8ee4b166bb -device virtio-scsi-pci,id=virtioscsi0,bus=pci.3,addr=0x1,iothread=iothread-virtioscsi0 -drive file=rbd:nvme/vm-105-disk-0:conf=/etc/pve/ceph.conf:id=admin:keyring=/etc/pve/priv/ceph/nvme.keyring,if=none,id=drive-scsi0,cache=writeback,format=raw,aio=io_uring,detect-zeroes=on -device scsi-hd,bus=virtioscsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0,id=scsi0,bootindex=100 -netdev type=tap,id=net0,ifname=tap105i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on -device virtio-net-pci,mac=BC:24:11:BE:A7:94,netdev=net0,bus=pci.0,addr=0x12,id=net0,rx_queue_size=1024,tx_queue_size=256,bootindex=101 -machine type=pc-q35-8.0+pve0 -incoming unix:/run/qemu-server/105.migrate -S

Emulated/Virtualized environment

  • Operating system: debian 12
  • OS/kernel version: Linux debian 6.1.0-13-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.55-1 (2023-09-29) x86_64 GNU/Linux
  • Architecture: x86_64

Description of problem

when i migrate the VM from one host to another, it fails and give messages:

[  584.109502] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
[  584.109534] rcu: 	1-...!: (0 ticks this GP) idle=1408/0/0x0 softirq=8428/8428 fqs=0 (false positive?)
[  584.109556] 	(detected by 0, t=5252 jiffies, g=2953, q=74 ncpus=2)
[  584.109561] Sending NMI from CPU 0 to CPUs 1:
[  584.109587] NMI backtrace for cpu 1 skipped: idling at native_safe_halt+0xb/0x10
[  584.110564] rcu: rcu_preempt kthread timer wakeup didn't happen for 5251 jiffies! g2953 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402
[  584.110585] rcu: 	Possible timer handling issue on cpu=1 timer-softirq=8006
[  584.110597] rcu: rcu_preempt kthread starved for 5252 jiffies! g2953 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=1
[  584.110614] rcu: 	Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.
[  584.110645] rcu: RCU grace-period kthread stack dump:
[  584.110658] task:rcu_preempt     state:I stack:0     pid:15    ppid:2      flags:0x00004000
[  584.110667] Call Trace:
[  584.110672]  <TASK>
[  584.110688]  __schedule+0x351/0xa20
[  584.110699]  ? rcu_gp_cleanup+0x480/0x480
[  584.110704]  schedule+0x5d/0xe0
[  584.110705]  schedule_timeout+0x94/0x150
[  584.110709]  ? __bpf_trace_tick_stop+0x10/0x10
[  584.110714]  rcu_gp_fqs_loop+0x141/0x4c0
[  584.110717]  rcu_gp_kthread+0xd0/0x190
[  584.110720]  kthread+0xe9/0x110
[  584.110725]  ? kthread_complete_and_exit+0x20/0x20
[  584.110728]  ret_from_fork+0x22/0x30
[  584.110735]  </TASK>
[  584.110736] rcu: Stack dump where RCU GP kthread last ran:
[  584.110747] Sending NMI from CPU 0 to CPUs 1:
[  584.110757] NMI backtrace for cpu 1 skipped: idling at native_safe_halt+0xb/0x10

we can reproduce on our R630 cluster easily, but it is OK on R730 cluster and R740 cluster.

Steps to reproduce

  1. create and run an VM
  2. migrate the vm to other host
  3. it failed with message

Additional information

i downgrade the pve-qemu-kvm from 8.1.2-4 to 8.0.2-3, same problem.

Edited by Peng Yong
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information