drive-backup job hangs in a 'paused' state after unsuccessful first attempt
Hi there. The problem I found is that a new drive-backup job hangs in a 'paused' state if the previous one ended with this error: GenericError: Failed to connect to '10.20.30.40:60001': Connection refused.
QEMU release info:
{
"package": "Debian 1:5.2+dfsg-9~bpo10+1",
"qemu": {
"major": 5,
"micro": 0,
"minor": 2
}
}
Host info:
PRETTY_NAME="Debian GNU/Linux 10 (buster)"
NAME="Debian GNU/Linux"
VERSION_ID="10"
VERSION="10 (buster)"
VERSION_CODENAME=buster
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"
How to reproduce
-
Run the QEMU process:
/usr/bin/qemu-system-x86_64 \ -machine accel=kvm:tcg \ -name alice \ -m 2048M \ -nodefaults -no-user-config \ -cpu Westmere -smp cpus=5,sockets=3,cores=3,maxcpus=9 \ -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x03 \ -device virtio-serial-pci,bus=pci.0,addr=0x4 \ -chardev socket,id=virtcon,path=/var/run/kvm-monitor/alice.virtcon,server,nowait \ -device virtconsole,chardev=virtcon,name=console.0 \ -vga cirrus \ -chardev socket,id=qga0,path=/var/run/kvm-monitor/alice.qga,server,nowait \ -device virtio-serial-pci,id=virtio-serial-qga0,bus=pci.0,addr=0x5 \ -device virtserialport,chardev=qga0,name=org.guest-agent.0 \ -device vhost-vsock-pci,id=vsock_device,guest-cid=69650 \ -iscsi initiator-name=iqn.2008-11.org.linux-kvm:qemu \ -drive file=/dev/vgnvme/alice,id=alice,format=raw,if=none,aio=native,cache=none,detect-zeroes=on \ -device virtio-blk-pci,drive=alice,id=blk_alice,bus=pci.0 \ -netdev tap,ifname=alice,id=alice,vhost=on \ -device virtio-net-pci,netdev=alice,id=net_alice,mac=54:52:00:ce:f9:93,bus=pci.0 \ -vnc 127.0.0.2:1024,password,websocket=11724 \ -qmp unix:/var/run/kvm-monitor/alice.qmp,server,nowait \ -runas alice \ -chroot /var/run/kvm-chroot/alice
-
Run
drive-backup
command using QMP monitor:drive-backup job-id=copy_alice device=alice sync=full mode=existing target=nbd://10.20.30.40:60001/BACKUP
At the moment we need to catch the
Connection refused
error so it's OK that no one is listening on 10.60.11.35:60001. -
On another server (10.20.30.40 in my case) create a disk and export it via NBD:
qemu-img create -f raw /root/backup_disk 5G qemu-nbd --port=60001 --bind=0.0.0.0 --export-name=BACKUP -f raw --cache none --aio native /root/backup_disk
-
Run
drive-backup
command again. The result will be successful: QEMU will return{}
. -
Run
query-block-jobs
command using QMP monitor... and here it is:{ "auto-dismiss": true, "auto-finalize": true, "busy": false, "device": "copy_alice", "io-status": "ok", "len": 0, "offset": 0, "paused": true, "ready": false, "speed": 0, "status": "paused", "type": "backup" }
If we try to resume this job, an error will occur:
GenericError: Can't resume a job that was not paused