qemu-system-x86_64: Live Migration fails with BLOCK_JOB_ERROR
Host environment
- Operating system: Debian GNU/Linux 11 (bullseye)
- OS/kernel version: Linux 5.10.196-1 #1 SMP Thu Sep 21 16:06:56 UTC 2023 x86_64 GNU/Linux
- Architecture: x86_64
- QEMU flavor: qemu-system-x86_64
- QEMU version: QEMU emulator version 7.2.0
- QEMU command line on source host
/opt/qemu-7.2.0-rbd/bin/qemu-system-x86_64 -pidfile /test/run/qemu.pid -chroot /test/jail -nographic -cpu host,migratable=yes,-vmx,-svm,invpcid=off -display vnc=unix:/test/run/vnc.socket -chardev socket,id=charmonitor,path=/test/run/qemu.monitor,server,nowait,logfile=/test/run/qemu_mon.log,logappend=off -mon chardev=charmonitor,id=monitor,mode=control -chardev socket,id=charhmpmonitor,path=/test/run/qemu_hmp.monitor,server,nowait,logfile=/test/run/qemu_hmp.log,logappend=off -mon chardev=charhmpmonitor,id=hmpmonitor -enable-kvm -name live-migration,debug-threads=on -runas test -m 16384M -smp 8 -rtc clock=vm -no-user-config -nodefaults -msg timestamp=on -machine q35,accel=kvm -vga std -kernel /test/kernel/grub-2.06-20.img -device virtio-scsi-pci,id=scsi0 -drive file=/dev/vg1/vg-test,if=none,media=disk,id=drive-scsi-disk-0,aio=native,cache.direct=on,format=raw,read-only=off,auto-read-only=off -device scsi-hd,bus=scsi0.0,scsi-id=0,channel=0,lun=0,drive=drive-scsi-disk-0,id=drive0,bootindex=1,rotation_rate=1 -append console=tty1 console=ttyS0 ro devtmpfs.mount=1 -no-reboot -netdev tap,id=net0,ifname=tap.62269405_0,script=no,downscript=no,vhost=on,queues=8 -device virtio-net-pci,netdev=net0,mac=f2:3c:94:ec:3f:9d,vectors=18,mq=on -parallel none -serial chardev:serial0 -chardev stdio,mux=off,id=serial0,signal=off -D /test/run/qemu-debug.log -d unimp
Emulated/Virtualized environment
- Operating system: Debian 11
- OS/kernel version: Linux localhost 5.10.0-30-amd64 #1 SMP Debian 5.10.218-1 (2024-06-01) x86_64 GNU/Linux
- Architecture: x86_64
Description of problem
After disk migration is completed and RAM migration is being performed, migration status switches to 'pre-switchover'. In the 'pre-switchover' migration state, block jobs status is still set to 'ready' instead of 'running' on queried for block job status when 'offset' and 'length' diverged. Thus, It results in BLOCK_JOB_ERROR.
Steps to reproduce
On source host
- Add disk(s) that needed to be migrated by issuing 'blockdev-add' QMP command.
- start blockdev-mirror operations to perform disk(s) transfer by issuing QMP command
- start RAM migration. (send HMP commands - listed below
- Migration status changed to 'pre-switchover'. While in 'pre-switchover', check for disk activity
While RAM migration is happening, Migration status is changed to 'pre-switchover' and observe that block jobs 'offset' and 'length' diverged. But, block job status is still set to 'ready' instead of 'running'.
On destination host
- Launch the VM in listening mode (-incoming) for migrations
- start NBD server
- add disks to NBD server.
- set migration parameters by sending HMP commands
Additional information
On SOURCE Host, start all blockdev-add operations
Issue QMP commands (blockdev-add) for all block devices ("drive-scsi-disk-0" and "drive-scsi-disk-1") of VM
{
"execute" => "blockdev-add",
"arguments" => {
"driver" => "raw",
"node-name" => "node_drive-scsi-disk-0",
"auto-read-only" => false,
"read-only" => false,
"file" => {
"driver" => "nbd",
"export" => "drive-scsi-disk-0",
"server" => {
"type" => "inet",
"host" => "2600:3c0f:17:14::21",
"port" => "37552",
},
"tls-creds" => "tlscreds0"
}
}
}
{
"execute" => "blockdev-add",
"arguments" => {
"driver" => "raw",
"node-name" => "node_drive-scsi-disk-1",
"auto-read-only" => false,
"read-only" => false,
"file" => {
"driver" => "nbd",
"export" => "drive-scsi-disk-1",
"server" => {
"type" => "inet",
"host" => "2600:3c0f:17:14::21",
"port" => "37552",
},
"tls-creds" => "tlscreds0"
}
}
}
On SOURCE Host, start all blockdev-mirror operations to start disk transfer
i.e Issue QMP commands (blockdev-mirror) for each of those block devices ("drive-scsi-disk-0" and "drive-scsi-disk-1")
{
"execute" => "blockdev-mirror",
"arguments" => {
"device" => "drive-scsi-disk0",
"target" => "node_drive-scsi-disk-0",
"speed" => 100000000,
"sync" => "full",
}
}
{
"execute" => "blockdev-mirror",
"arguments" => {
"device" => "drive-scsi-disk1",
"target" => "node_drive-scsi-disk-1",
"speed" => 100000000,
"sync" => "full",
}
}
NBD server configuration on destination host by issuing QMP command
Start NBD server
{
"execute" => "nbd-server-start",
"arguments" => {
"addr" => {
"type" => "inet",
"data" => {
"host" => "2600:3c0f:17:14::21",
"port" => "37552"
}
},
"tls-creds" => "tlscreds0"
}
}
On DESTINATION Host
Register incoming disks(2) with NBD server by issuing QMP commands to VM on the destination host
Disk# 1
{
"execute" => "nbd-server-add",
"arguments" => {
"device" => "drive-scsi-disk0",
"writable" => true
}
}
Disk# 2
{
"execute" => "nbd-server-add",
"arguments" => {
"device" => "drive-scsi-disk1",
"writable" => true
}
}
Wait for disks to finish the bulk of the data migration.
keep checking block job status in a loop by sending "query-block-jobs" until the disk(s)
are migrated to the destination host completely. subsequently, perform RAM migration.
while(1) {
$query_block_jobs_resp =
{
"execute" => "query-block-jobs"
}
$copied = 0;
$total = 0;
$allReady = 1;
for each my $block_job ($query_block_jobs_resp) {
$copied = $block_job->{"offset"};
$total += $block_job->{"len"};
# check if the block job is in "ready" status
if !$block_job->{ready} {
$allReady = 0;
}
}
# disks are migrated completely and block job "status" is set to "ready"
if ($allReady) {
break;
}
}
Disks migrated, start RAM migration by sending HMP commands
"migrate_set_parameter max-bandwidth 40G"
"migrate_set_capability xbzrle on"
"migrate_set_capability zero-blocks on"
"migrate_set_capability compress on"
"migrate_set_capability auto-converge on"
"migrate_set_parameter cpu-throttle-initial 10"
"migrate_set_parameter cpu-throttle-increment 1"
"migrate_set_capability pause-before-switchover on"
On Destination, before the live migration is started
"migrate_set_parameter max-bandwidth 40g"
"migrate_set_capability xbzrle on"
"migrate_set_capability zero-blocks on"
"migrate_set_capability compress on"
"migrate_set_capability auto-converge on"
"migrate_set_parameter cpu-throttle-initial 10"
"migrate_set_parameter cpu-throttle-increment 1"
"migrate_set_capability pause-before-switchover on"
QEMU debug logs on source host
Disk migration started on the source host with 'query-block-jobs' for disk ('drive-scsi-disk-0')
offset and length are not equal and job_status (2: JOB_STATUS_RUNNING)
(mirror_start_job, 1870): started(drive-scsi-disk-0) job_start
(block_job_query_locked, 358): get drive-scsi-disk-0 job progress offset: 16777216 len 5368709120 speed: 100000000 ready: 0 job status: 2
(block_job_query_locked, 358): get drive-scsi-disk-0 job progress offset: 16777216 len 5368709120 speed: 100000000 ready: 0 job status: 2
(block_job_query_locked, 358): get drive-scsi-disk-0 job progress offset: 33554432 len 5368709120 speed: 100000000 ready: 0 job status: 2
(block_job_query_locked, 358): get drive-scsi-disk-0 job progress offset: 50331648 len 5368709120 speed: 100000000 ready: 0 job status: 2
(block_job_query_locked, 358): get drive-scsi-disk-0 job progress offset: 50331648 len 5368709120 speed: 100000000 ready: 0 job status: 2
(block_job_query_locked, 358): get drive-scsi-disk-0 job progress offset: 67108864 len 5368709120 speed: 100000000 ready: 0 job status: 2
(block_job_query_locked, 358): get drive-scsi-disk-0 job progress offset: 83886080 len 5368709120 speed: 100000000 ready: 0 job status: 2
(block_job_query_locked, 358): get drive-scsi-disk-0 job progress offset: 100663296 len 5368709120 speed: 100000000 ready: 0 job status: 2
...
...
...
offset and length have converted and block job status changed to READY (4: JOB_STATUS_READY)
(mirror_run,1052):(drive-scsi-disk-0): Transition Mirror Job to READY 5371723776/5371723776
(block_job_query_locked, 358): get drive-scsi-disk-0 job progress offset: 5371723776 len 5371723776 speed: 100000000 ready: 1 job status: 4
(migrate_set_state,1869): migration from 0 to 1, generate event
(migrate_set_state,1869): migration from 1 to 4, generate event
... ...
Then, offset and length have diverged, But, block job status is still set to READY (4: JOB_STATUS_READY) instead of (2: JOB_STATUS_RUNNING).
block_job_query_locked, 358): get drive-scsi-disk-0 job progress offset: 5372313600 len 5372313600 speed: 100000000 ready: 1 job status: 4
block_job_query_locked, 358): get drive-scsi-disk-0 job progress offset: 5372313600 len 5372313600 speed: 100000000 ready: 1 job status: 4
block_job_query_locked, 358): get drive-scsi-disk-0 job progress offset: 5372313600 len 5372313600 speed: 100000000 ready: 1 job status: 4
block_job_query_locked, 358): get drive-scsi-disk-0 job progress offset: 5372313600 len 5372313600 speed: 100000000 ready: 1 job status: 4
block_job_query_locked, 358): get drive-scsi-disk-0 job progress offset: 5437259776 len 5864685568 speed: 100000000 ready: 1 job status: 4
block_job_query_locked, 358): get drive-scsi-disk-0 job progress offset: 5569380352 len 6572670976 speed: 100000000 ready: 1 job status: 4
block_job_query_locked, 358): get drive-scsi-disk-0 job progress offset: 5687869440 len 7077036032 speed: 100000000 ready: 1 job status: 4
block_job_query_locked, 358): get drive-scsi-disk-0 job progress offset: 5829492736 len 7678459904 speed: 100000000 ready: 1 job status: 4
block_job_query_locked, 358): get drive-scsi-disk-0 job progress offset: 6061228032 len 8468365312 speed: 100000000 ready: 1 job status: 4
block_job_query_locked, 358): get drive-scsi-disk-0 job progress offset: 6214713344 len 8832024576 speed: 100000000 ready: 1 job status: 4
block_job_query_locked, 358): get drive-scsi-disk-0 job progress offset: 6364135424 len 9090891776 speed: 100000000 ready: 1 job status: 4
block_job_query_locked, 358): get drive-scsi-disk-0 job progress offset: 6513033216 len 9115467776 speed: 100000000 ready: 1 job status: 4
Eventually, the migration fails with a BLOCK_JOB_ERROR event.
migrate_pause_before_switchover,2624): pause before switchover 1
migration_maybe_pause, 3300): setting migration state current 4 to pre-switchover
migrate_set_state,1869): migration from 4 to 11, generate event
block_job_query_locked, 358): get drive-scsi-disk-0 job progress offset: 46468366336 len 48996089856 speed: 100000000 ready: 1 job status: 4
block_job_query_locked, 358): get drive-scsi-disk-0 job progress offset: 47022014464 len 48996089856 speed: 100000000 ready: 1 job status: 4
block_job_error_action:570): send block_job_error for the job
qapi_event_send_block_job_error, 167): event_emit BLOCK_JOB_ERROR
block_job_error_action:570): send block_job_error for the job
qapi_event_send_block_job_error, 167): event_emit BLOCK_JOB_ERROR
block_job_error_action:570): send block_job_error for the job
qapi_event_send_block_job_error, 167): event_emit BLOCK_JOB_ERROR
block_job_error_action:570): send block_job_error for the job