Qemu crashes if it is paused and migrated twice
I first reported this bug here: https://bugs.launchpad.net/nova/+bug/1947725 because it occurs in the context of OpenStack, but the actual problem seems to be with Qemu (see logs at the end).
Host environment
- Operating system: Linux, Ubuntu Bionic
- OS/kernel version: Linux ybk140931 5.4.0-87-generic #98~18.04.1-Ubuntu SMP Wed Sep 22 10:45:04 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
- Architecture: x86_64
- QEMU flavor: qemu-system-x86_64
- QEMU version: QEMU emulator version 4.2.1 (Debian 1:4.2-3ubuntu6.18+syseleven0) (local patch to disable OFD locks)
- QEMU command line: /var/log/libvirt/qemu/$GUEST.log: instance-00000ba2.log
Emulated/Virtualized environment
- Operating system: Linux, Ubuntu 20.04.3 LTS (Focal Fossa)
- OS/kernel version: Linux large-kickstart 5.4.0-89-generic #100-Ubuntu SMP Fri Sep 24 14:50:10 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
- Architecture: x86_64
Description of problem
If the vm is in PAUSED state (in Openstack parlance) (I think libvirt calls that paused as well but uses the command virsh suspend
), and live-migrated twice, the second time the Qemu process terminates.
This is perfectly repeatable.
If the VM is unpaused and re-paused after the first migration, then the problem does not occur on the next migration.
Steps to reproduce
See also the referenced bug report to openstack, above.
$ openstack stack create ....
-
$ openstack server pause <UUID>
(wait until done) -
$ openstack server migrate --live-migration <UUID>
(wait until done) $ openstack server migrate --live-migration <UUID>
The VM is now in ERROR state because it has disappeared: libvirt.libvirtError: Domain not found: no domain with matching uuid '<UUID>'
Additional information
The last few lines from the instance-00000ba2.log seem pertinent (this is from the receiving Qemu instance):
2021-10-22 15:32:53.829+0000: initiating migration
qemu-system-x86_64: /build/qemu-lb4V37/qemu-4.2/block.c:5523: bdrv_inactivate_recurse: Assertion `!(bs->open_flags & BDRV_O_INACTIVE)' failed.
2021-10-22 15:32:59.122+0000: shutting down, reason=crashed
This is logged by libvirt (also on the receiving side):
Oct 22 15:29:04 ybk140931 ovs-vsctl[20174]: ovs|00001|vsctl|INFO|Called as ovs-vsctl --timeout=5 -- --if-exists del-port tap3a71aa63-6a
Oct 22 15:31:31 ybk140931 ovs-vsctl[21412]: ovs|00001|vsctl|INFO|Called as ovs-vsctl --timeout=5 -- --if-exists del-port tap3a71aa63-6a -- add-port br-int tap3a71aa63-6a -- set Interface tap3a71aa63-6a "external-ids:attached-mac=\"fa:16:3e:da:03:56\"" -- set Interface tap3a71aa63-6a "external-ids:iface-id=\"3a71aa63-6a39-41d8-9602-04b84834db9e\"" -- set Interface tap3a71aa63-6a "external-ids:vm-id=\"de2b27d2-345c-45fc-8f37-2fa0ed1a1151\"" -- set Interface tap3a71aa63-6a external-ids:iface-status=active
Oct 22 15:32:58 ybk140931 libvirtd[3237]: Unable to read from monitor: Connection reset by peer
Oct 22 15:32:59 ybk140931 ovs-vsctl[22001]: ovs|00001|vsctl|INFO|Called as ovs-vsctl --timeout=5 -- --if-exists del-port tap3a71aa63-6a
Oct 22 15:32:59 ybk140931 libvirtd[3237]: operation failed: domain is not running
Oct 22 15:32:59 ybk140931 libvirtd[3237]: internal error: qemu unexpectedly closed the monitor: 2021-10-22T15:32:58.845667Z qemu-system-x86_64: Failed to load virtio_pci/modern_queue_state:used
2021-10-22T15:32:58.845687Z qemu-system-x86_64: Failed to load virtio_pci/modern_state:vqs
2021-10-22T15:32:58.845690Z qemu-system-x86_64: Failed to load virtio/extra_state:extra_state
2021-10-22T15:32:58.845692Z qemu-system-x86_64: Failed to load virtio-rng:virtio
2021-10-22T15:32:58.845695Z qemu-system-x86_64: error while loading state for instance 0x0 of device '0000:00:06.0/virtio-rng'
2021-10-22T15:32:58.847860Z qemu-system-x86_64: load of migration failed: Input/output error
Oct 22 15:32:59 ybk140931 libvirtd[3237]: operation failed: domain 'instance-00000ba2' is not running