Skip to content

Qemu crashes if it is paused and migrated twice

I first reported this bug here: https://bugs.launchpad.net/nova/+bug/1947725 because it occurs in the context of OpenStack, but the actual problem seems to be with Qemu (see logs at the end).

Host environment

  • Operating system: Linux, Ubuntu Bionic
  • OS/kernel version: Linux ybk140931 5.4.0-87-generic #98~18.04.1-Ubuntu SMP Wed Sep 22 10:45:04 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
  • Architecture: x86_64
  • QEMU flavor: qemu-system-x86_64
  • QEMU version: QEMU emulator version 4.2.1 (Debian 1:4.2-3ubuntu6.18+syseleven0) (local patch to disable OFD locks)
  • QEMU command line: /var/log/libvirt/qemu/$GUEST.log: instance-00000ba2.log

Emulated/Virtualized environment

  • Operating system: Linux, Ubuntu 20.04.3 LTS (Focal Fossa)
  • OS/kernel version: Linux large-kickstart 5.4.0-89-generic #100-Ubuntu SMP Fri Sep 24 14:50:10 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
  • Architecture: x86_64

Description of problem

If the vm is in PAUSED state (in Openstack parlance) (I think libvirt calls that paused as well but uses the command virsh suspend), and live-migrated twice, the second time the Qemu process terminates.

This is perfectly repeatable.

If the VM is unpaused and re-paused after the first migration, then the problem does not occur on the next migration.

Steps to reproduce

See also the referenced bug report to openstack, above.

  1. $ openstack stack create ....
  2. $ openstack server pause <UUID> (wait until done)
  3. $ openstack server migrate --live-migration <UUID> (wait until done)
  4. $ openstack server migrate --live-migration <UUID>

The VM is now in ERROR state because it has disappeared: libvirt.libvirtError: Domain not found: no domain with matching uuid '<UUID>'

Additional information

The last few lines from the instance-00000ba2.log seem pertinent (this is from the receiving Qemu instance):

2021-10-22 15:32:53.829+0000: initiating migration
qemu-system-x86_64: /build/qemu-lb4V37/qemu-4.2/block.c:5523: bdrv_inactivate_recurse: Assertion `!(bs->open_flags & BDRV_O_INACTIVE)' failed.
2021-10-22 15:32:59.122+0000: shutting down, reason=crashed

This is logged by libvirt (also on the receiving side):

Oct 22 15:29:04 ybk140931 ovs-vsctl[20174]: ovs|00001|vsctl|INFO|Called as ovs-vsctl --timeout=5 -- --if-exists del-port tap3a71aa63-6a
Oct 22 15:31:31 ybk140931 ovs-vsctl[21412]: ovs|00001|vsctl|INFO|Called as ovs-vsctl --timeout=5 -- --if-exists del-port tap3a71aa63-6a -- add-port br-int tap3a71aa63-6a -- set Interface tap3a71aa63-6a "external-ids:attached-mac=\"fa:16:3e:da:03:56\"" -- set Interface tap3a71aa63-6a "external-ids:iface-id=\"3a71aa63-6a39-41d8-9602-04b84834db9e\"" -- set Interface tap3a71aa63-6a "external-ids:vm-id=\"de2b27d2-345c-45fc-8f37-2fa0ed1a1151\"" -- set Interface tap3a71aa63-6a external-ids:iface-status=active
Oct 22 15:32:58 ybk140931 libvirtd[3237]: Unable to read from monitor: Connection reset by peer
Oct 22 15:32:59 ybk140931 ovs-vsctl[22001]: ovs|00001|vsctl|INFO|Called as ovs-vsctl --timeout=5 -- --if-exists del-port tap3a71aa63-6a
Oct 22 15:32:59 ybk140931 libvirtd[3237]: operation failed: domain is not running
Oct 22 15:32:59 ybk140931 libvirtd[3237]: internal error: qemu unexpectedly closed the monitor: 2021-10-22T15:32:58.845667Z qemu-system-x86_64: Failed to load virtio_pci/modern_queue_state:used
                                          2021-10-22T15:32:58.845687Z qemu-system-x86_64: Failed to load virtio_pci/modern_state:vqs
                                          2021-10-22T15:32:58.845690Z qemu-system-x86_64: Failed to load virtio/extra_state:extra_state
                                          2021-10-22T15:32:58.845692Z qemu-system-x86_64: Failed to load virtio-rng:virtio
                                          2021-10-22T15:32:58.845695Z qemu-system-x86_64: error while loading state for instance 0x0 of device '0000:00:06.0/virtio-rng'
                                          2021-10-22T15:32:58.847860Z qemu-system-x86_64: load of migration failed: Input/output error
Oct 22 15:32:59 ybk140931 libvirtd[3237]: operation failed: domain 'instance-00000ba2' is not running
Edited by O Seibert Syseleven
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information