qemu-ga fsfreeze crashes the kernel
Host environment
- Operating system: Proxmox 6.4
- OS/kernel version: Linux pve1 5.4.114-1-pve
- Architecture: x86_64
- QEMU flavor: qemu-system-x86_64
- QEMU version: QEMU emulator version 5.2.0 (pve-qemu-kvm_5.2.0)
- QEMU command line: qemu-ga fs-freeze
Emulated/Virtualized environment
- Operating system: CentOS 7
- OS/kernel version: 3.10.0-1160.36.2.el7.x86_64
- Architecture: x86_64
Description of problem
Hello,
Still required your attention, duplicate from: https://bugs.launchpad.net/bugs/1807073 https://bugs.launchpad.net/bugs/1813045
We use mainly Cloudlinux, Debian and Centos. We experienced many crashes on our qemu instances based on Cloudlinux during a snapshot. The issue is not related to CloudLinux directly, but to Qemu agent, which does not freeze the file system(s) correctly. What is actually happening:
When VM backup is invoked, Qemu agent freezes the file systems, so no single change will be made during the backup. But Qemu agent does not respect the loop* devices in freezing order (we have checked its sources), which leads to the next situation:
- freeze loopback fs ---> send async reqs to loopback thread
- freeze main fs
- loopback thread wakes up and trying to write data to the main fs, which is still frozen, and this finally leads to the hung task and kernel crash.
Moreover, a lot of Proxmox users are complaining about the issue as well: https://forum.proxmox.com/threads/error-vm-100-qmp-command-guest-fsfreeze-thaw-failed-got-timeout.68082/ https://forum.proxmox.com/threads/problem-with-fsfreeze-freeze-and-qemu-guest-agent.65707/
Steps to reproduce
- Manually start backup for the VM with qemu-agent enabled.
- The backup process stuck at "INFO: issuing guest-agent 'fs-freeze' command"
- The VM become unavailable, you can only unlock it and force reset.
Additional information
/var/log/messages logs:
Aug 6 21:54:00 cpanel qemu-ga: info: guest-ping called
Aug 6 21:54:01 cpanel qemu-ga: info: guest-fsfreeze called
Aug 6 21:54:01 cpanel qemu-ga: info: executing fsfreeze hook with arg 'freeze'
after this the VM becomes completely unavailable.