qemu-system-x86_64: Live Migration fails with BLOCK_JOB_ERROR

Host environment

  • Operating system: Debian GNU/Linux 11 (bullseye)
  • OS/kernel version: Linux 5.10.196-1 #1 SMP Thu Sep 21 16:06:56 UTC 2023 x86_64 GNU/Linux
  • Architecture: x86_64
  • QEMU flavor: qemu-system-x86_64
  • QEMU version: QEMU emulator version 7.2.0
  • QEMU command line on source host

/opt/qemu-7.2.0-rbd/bin/qemu-system-x86_64 -pidfile /test/run/qemu.pid -chroot /test/jail -nographic -cpu host,migratable=yes,-vmx,-svm,invpcid=off -display vnc=unix:/test/run/vnc.socket -chardev socket,id=charmonitor,path=/test/run/qemu.monitor,server,nowait,logfile=/test/run/qemu_mon.log,logappend=off -mon chardev=charmonitor,id=monitor,mode=control -chardev socket,id=charhmpmonitor,path=/test/run/qemu_hmp.monitor,server,nowait,logfile=/test/run/qemu_hmp.log,logappend=off -mon chardev=charhmpmonitor,id=hmpmonitor -enable-kvm -name live-migration,debug-threads=on -runas test -m 16384M -smp 8 -rtc clock=vm -no-user-config -nodefaults -msg timestamp=on -machine q35,accel=kvm -vga std -kernel /test/kernel/grub-2.06-20.img -device virtio-scsi-pci,id=scsi0 -drive file=/dev/vg1/vg-test,if=none,media=disk,id=drive-scsi-disk-0,aio=native,cache.direct=on,format=raw,read-only=off,auto-read-only=off -device scsi-hd,bus=scsi0.0,scsi-id=0,channel=0,lun=0,drive=drive-scsi-disk-0,id=drive0,bootindex=1,rotation_rate=1 -append console=tty1 console=ttyS0 ro devtmpfs.mount=1 -no-reboot -netdev tap,id=net0,ifname=tap.62269405_0,script=no,downscript=no,vhost=on,queues=8 -device virtio-net-pci,netdev=net0,mac=f2:3c:94:ec:3f:9d,vectors=18,mq=on -parallel none -serial chardev:serial0 -chardev stdio,mux=off,id=serial0,signal=off -D /test/run/qemu-debug.log -d unimp

Emulated/Virtualized environment

  • Operating system: Debian 11
  • OS/kernel version: Linux localhost 5.10.0-30-amd64 #1 SMP Debian 5.10.218-1 (2024-06-01) x86_64 GNU/Linux
  • Architecture: x86_64

Description of problem

After disk migration is completed and RAM migration is being performed, migration status switches to 'pre-switchover'. In the 'pre-switchover' migration state, block jobs status is still set to 'ready' instead of 'running' on queried for block job status when 'offset' and 'length' diverged. Thus, It results in BLOCK_JOB_ERROR.

Steps to reproduce

On source host

  1. Add disk(s) that needed to be migrated by issuing 'blockdev-add' QMP command.
  2. start blockdev-mirror operations to perform disk(s) transfer by issuing QMP command
  3. start RAM migration. (send HMP commands - listed below
  4. Migration status changed to 'pre-switchover'. While in 'pre-switchover', check for disk activity

While RAM migration is happening, Migration status is changed to 'pre-switchover' and observe that block jobs 'offset' and 'length' diverged. But, block job status is still set to 'ready' instead of 'running'.

On destination host

  1. Launch the VM in listening mode (-incoming) for migrations
  2. start NBD server
  3. add disks to NBD server.
  4. set migration parameters by sending HMP commands

Additional information

On SOURCE Host, start all blockdev-add operations

Issue QMP commands (blockdev-add) for all block devices ("drive-scsi-disk-0" and "drive-scsi-disk-1") of VM

            {
                "execute"   => "blockdev-add",
                "arguments" => {
                    "driver"         => "raw",
                    "node-name"      => "node_drive-scsi-disk-0",
                    "auto-read-only" => false,
                    "read-only"      => false,
                    "file"           => {
                        "driver" => "nbd",
                        "export" => "drive-scsi-disk-0",
                        "server" => {
                            "type" => "inet",
                            "host" => "2600:3c0f:17:14::21",
                            "port" => "37552",
                        },
                        "tls-creds" => "tlscreds0"
                    }
                }
            }
        {
            "execute"   => "blockdev-add",
            "arguments" => {
                "driver"         => "raw",
                "node-name"      => "node_drive-scsi-disk-1",
                "auto-read-only" => false,
                "read-only"      => false,
                "file"           => {
                    "driver" => "nbd",
                    "export" => "drive-scsi-disk-1",
                    "server" => {
                        "type" => "inet",
                        "host" => "2600:3c0f:17:14::21",
                        "port" => "37552",
                    },
                    "tls-creds" => "tlscreds0"
                }
            }
        }

On SOURCE Host, start all blockdev-mirror operations to start disk transfer

i.e Issue QMP commands (blockdev-mirror) for each of those block devices ("drive-scsi-disk-0" and "drive-scsi-disk-1")

        {
            "execute"   => "blockdev-mirror",
            "arguments" => {
                "device" => "drive-scsi-disk0",
                "target" => "node_drive-scsi-disk-0",
                "speed"  => 100000000,
                "sync"   => "full",
            }
        }
        {
            "execute"   => "blockdev-mirror",
            "arguments" => {
                "device" => "drive-scsi-disk1",
                "target" => "node_drive-scsi-disk-1",
                "speed"  => 100000000,
                "sync"   => "full",
            }
        }

NBD server configuration on destination host by issuing QMP command

Start NBD server

        {
            "execute"   => "nbd-server-start",
            "arguments" => {
                "addr" => {
                    "type" => "inet",
                    "data" => {
                        "host" => "2600:3c0f:17:14::21",
                        "port" => "37552"
                    }
                },
                "tls-creds" => "tlscreds0"
            }
        }

On DESTINATION Host

Register incoming disks(2) with NBD server by issuing QMP commands to VM on the destination host

Disk# 1

        {
            "execute"   => "nbd-server-add",
            "arguments" => {
                "device"   => "drive-scsi-disk0",
                "writable" => true
            }
        }

Disk# 2

        {
            "execute"   => "nbd-server-add",
            "arguments" => {
                "device"   => "drive-scsi-disk1",
                "writable" => true
            }
        }

Wait for disks to finish the bulk of the data migration.

keep checking block job status in a loop by sending "query-block-jobs" until the disk(s)

are migrated to the destination host completely. subsequently, perform RAM migration.

       while(1) {
            $query_block_jobs_resp = 
            {
                "execute" => "query-block-jobs"
            }

            $copied = 0;
            $total = 0;
            $allReady = 1;
            for each my $block_job ($query_block_jobs_resp) {
                $copied = $block_job->{"offset"};
                $total += $block_job->{"len"};
                # check if the block job is in "ready" status
                if !$block_job->{ready} {
                  $allReady = 0;
                }
            }
            # disks are migrated completely and block job "status" is set to "ready"
            if ($allReady) {
              break;
            }
       }

Disks migrated, start RAM migration by sending HMP commands

    "migrate_set_parameter max-bandwidth 40G"
    "migrate_set_capability xbzrle on"
    "migrate_set_capability zero-blocks on"
    "migrate_set_capability compress on"
    "migrate_set_capability auto-converge on"
    "migrate_set_parameter cpu-throttle-initial 10"
    "migrate_set_parameter cpu-throttle-increment 1"
    "migrate_set_capability pause-before-switchover on"

On Destination, before the live migration is started

    "migrate_set_parameter max-bandwidth 40g"
    "migrate_set_capability xbzrle on"
    "migrate_set_capability zero-blocks on"
    "migrate_set_capability compress on"
    "migrate_set_capability auto-converge on"
    "migrate_set_parameter cpu-throttle-initial 10"
    "migrate_set_parameter cpu-throttle-increment 1"
    "migrate_set_capability pause-before-switchover on"

QEMU debug logs on source host

Disk migration started on the source host with 'query-block-jobs' for disk ('drive-scsi-disk-0')

offset and length are not equal and job_status (2: JOB_STATUS_RUNNING)


(mirror_start_job, 1870): started(drive-scsi-disk-0) job_start
(block_job_query_locked, 358): get drive-scsi-disk-0 job progress offset: 16777216  len 5368709120 speed: 100000000 ready: 0 job status: 2
(block_job_query_locked, 358): get drive-scsi-disk-0 job progress offset: 16777216  len 5368709120 speed: 100000000 ready: 0 job status: 2
(block_job_query_locked, 358): get drive-scsi-disk-0 job progress offset: 33554432  len 5368709120 speed: 100000000 ready: 0 job status: 2
(block_job_query_locked, 358): get drive-scsi-disk-0 job progress offset: 50331648  len 5368709120 speed: 100000000 ready: 0 job status: 2
(block_job_query_locked, 358): get drive-scsi-disk-0 job progress offset: 50331648  len 5368709120 speed: 100000000 ready: 0 job status: 2
(block_job_query_locked, 358): get drive-scsi-disk-0 job progress offset: 67108864  len 5368709120 speed: 100000000 ready: 0 job status: 2
(block_job_query_locked, 358): get drive-scsi-disk-0 job progress offset: 83886080  len 5368709120 speed: 100000000 ready: 0 job status: 2
(block_job_query_locked, 358): get drive-scsi-disk-0 job progress offset: 100663296  len 5368709120 speed: 100000000 ready: 0 job status: 2
...
...
...

offset and length have converted and block job status changed to READY (4: JOB_STATUS_READY)


(mirror_run,1052):(drive-scsi-disk-0): Transition Mirror Job to READY 5371723776/5371723776
(block_job_query_locked, 358): get drive-scsi-disk-0 job progress offset: 5371723776  len 5371723776 speed: 100000000 ready: 1 job status: 4
(migrate_set_state,1869): migration from 0 to 1, generate event
(migrate_set_state,1869): migration from 1 to 4, generate event

... ...

Then, offset and length have diverged, But, block job status is still set to READY (4: JOB_STATUS_READY) instead of (2: JOB_STATUS_RUNNING).

block_job_query_locked, 358): get drive-scsi-disk-0 job progress offset: 5372313600  len 5372313600 speed: 100000000 ready: 1 job status: 4
block_job_query_locked, 358): get drive-scsi-disk-0 job progress offset: 5372313600  len 5372313600 speed: 100000000 ready: 1 job status: 4
block_job_query_locked, 358): get drive-scsi-disk-0 job progress offset: 5372313600  len 5372313600 speed: 100000000 ready: 1 job status: 4
block_job_query_locked, 358): get drive-scsi-disk-0 job progress offset: 5372313600  len 5372313600 speed: 100000000 ready: 1 job status: 4
block_job_query_locked, 358): get drive-scsi-disk-0 job progress offset: 5437259776  len 5864685568 speed: 100000000 ready: 1 job status: 4
block_job_query_locked, 358): get drive-scsi-disk-0 job progress offset: 5569380352  len 6572670976 speed: 100000000 ready: 1 job status: 4
block_job_query_locked, 358): get drive-scsi-disk-0 job progress offset: 5687869440  len 7077036032 speed: 100000000 ready: 1 job status: 4
block_job_query_locked, 358): get drive-scsi-disk-0 job progress offset: 5829492736  len 7678459904 speed: 100000000 ready: 1 job status: 4
block_job_query_locked, 358): get drive-scsi-disk-0 job progress offset: 6061228032  len 8468365312 speed: 100000000 ready: 1 job status: 4
block_job_query_locked, 358): get drive-scsi-disk-0 job progress offset: 6214713344  len 8832024576 speed: 100000000 ready: 1 job status: 4
block_job_query_locked, 358): get drive-scsi-disk-0 job progress offset: 6364135424  len 9090891776 speed: 100000000 ready: 1 job status: 4
block_job_query_locked, 358): get drive-scsi-disk-0 job progress offset: 6513033216  len 9115467776 speed: 100000000 ready: 1 job status: 4

Eventually, the migration fails with a BLOCK_JOB_ERROR event.

migrate_pause_before_switchover,2624): pause before switchover 1
migration_maybe_pause, 3300): setting migration state current 4 to pre-switchover
migrate_set_state,1869): migration from 4 to 11, generate event
block_job_query_locked, 358): get drive-scsi-disk-0 job progress offset: 46468366336  len 48996089856 speed: 100000000 ready: 1 job status: 4
block_job_query_locked, 358): get drive-scsi-disk-0 job progress offset: 47022014464  len 48996089856 speed: 100000000 ready: 1 job status: 4
block_job_error_action:570): send block_job_error for the job
qapi_event_send_block_job_error, 167): event_emit BLOCK_JOB_ERROR
block_job_error_action:570): send block_job_error for the job
qapi_event_send_block_job_error, 167): event_emit BLOCK_JOB_ERROR
block_job_error_action:570): send block_job_error for the job
qapi_event_send_block_job_error, 167): event_emit BLOCK_JOB_ERROR
block_job_error_action:570): send block_job_error for the job