Skip to content

drive-backup job hangs in a 'paused' state after unsuccessful first attempt

Hi there. The problem I found is that a new drive-backup job hangs in a 'paused' state if the previous one ended with this error: GenericError: Failed to connect to '10.20.30.40:60001': Connection refused.

QEMU release info:

{
    "package": "Debian 1:5.2+dfsg-9~bpo10+1",
    "qemu": {
        "major": 5,
        "micro": 0,
        "minor": 2
    }
}

Host info:

PRETTY_NAME="Debian GNU/Linux 10 (buster)"
NAME="Debian GNU/Linux"
VERSION_ID="10"
VERSION="10 (buster)"
VERSION_CODENAME=buster
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"

How to reproduce

  1. Run the QEMU process:

    /usr/bin/qemu-system-x86_64 \
        -machine accel=kvm:tcg \
        -name alice \
        -m 2048M \
        -nodefaults -no-user-config \
        -cpu Westmere -smp cpus=5,sockets=3,cores=3,maxcpus=9 \
        -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x03 \
        -device virtio-serial-pci,bus=pci.0,addr=0x4 \
        -chardev socket,id=virtcon,path=/var/run/kvm-monitor/alice.virtcon,server,nowait \
        -device virtconsole,chardev=virtcon,name=console.0 \
        -vga cirrus \
        -chardev socket,id=qga0,path=/var/run/kvm-monitor/alice.qga,server,nowait \
        -device virtio-serial-pci,id=virtio-serial-qga0,bus=pci.0,addr=0x5 \
        -device virtserialport,chardev=qga0,name=org.guest-agent.0 \
        -device vhost-vsock-pci,id=vsock_device,guest-cid=69650 \
        -iscsi initiator-name=iqn.2008-11.org.linux-kvm:qemu \
        -drive file=/dev/vgnvme/alice,id=alice,format=raw,if=none,aio=native,cache=none,detect-zeroes=on \
        -device virtio-blk-pci,drive=alice,id=blk_alice,bus=pci.0 \
        -netdev tap,ifname=alice,id=alice,vhost=on \
        -device virtio-net-pci,netdev=alice,id=net_alice,mac=54:52:00:ce:f9:93,bus=pci.0 \
        -vnc 127.0.0.2:1024,password,websocket=11724 \
        -qmp unix:/var/run/kvm-monitor/alice.qmp,server,nowait \
        -runas alice \
        -chroot /var/run/kvm-chroot/alice
  2. Run drive-backup command using QMP monitor:

    drive-backup job-id=copy_alice device=alice sync=full mode=existing target=nbd://10.20.30.40:60001/BACKUP

    At the moment we need to catch the Connection refused error so it's OK that no one is listening on 10.60.11.35:60001.

  3. On another server (10.20.30.40 in my case) create a disk and export it via NBD:

    qemu-img create -f raw /root/backup_disk 5G
    qemu-nbd --port=60001 --bind=0.0.0.0 --export-name=BACKUP -f raw --cache none --aio native /root/backup_disk
  4. Run drive-backup command again. The result will be successful: QEMU will return {}.

  5. Run query-block-jobs command using QMP monitor... and here it is:

    {
      "auto-dismiss": true,
      "auto-finalize": true,
      "busy": false,
      "device": "copy_alice",
      "io-status": "ok",
      "len": 0,
      "offset": 0,
      "paused": true,
      "ready": false,
      "speed": 0,
      "status": "paused",
      "type": "backup"
    }

    If we try to resume this job, an error will occur:

    GenericError: Can't resume a job that was not paused
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information