qemu-img cannot repair a qcow2 in an LV because size is mis-detected when qcow2 is on an LV
Host environment
- Operating system: (Windows 10 21H1, Fedora 34, etc.)
- OS/kernel version: (For POSIX hosts, use
uname -a
) - Architecture: (x86, ARM, s390x, etc.)
- QEMU flavor: (qemu-system-x86_64, qemu-aarch64, qemu-img, etc.)
- QEMU version: (e.g.
qemu-system-x86_64 --version
) - QEMU command line:
./qemu-system-x86_64 -M q35 -m 4096 -enable-kvm -hda fedora32.qcow2
Emulated/Virtualized environment
- Operating system: (Windows 10 21H1, Fedora 34, etc.)
- OS/kernel version: (For POSIX guests, use
uname -a
.) - Architecture: (x86, ARM, s390x, etc.)
Description of problem
This is RHEV with Tb's of VMs which need to be repaired due to a datacenter-wide (the real datacenter) power outage.
Each of these VMs are on individual LVs but qemu-img check fails to perform repairs:
ERROR cluster 24481205 refcount=0 reference=1
ERROR cluster 24481206 refcount=0 reference=1
Rebuilding refcount structure
ERROR writing refblock: No space left on device <============
qemu-img: Check failed: No space left on device
Running qemu-img check or info on the LV (/dev/dm-*) works well but repairs cannot be completed:
# qemu-img info /dev/cdd4e215-8c6b-4877-b2be-fdba383e7eb0/fb32333b-2334-4e10-8c42-02bc97e826cc
image: /dev/cdd4e215-8c6b-4877-b2be-fdba383e7eb0/fb32333b-2334-4e10-8c42-02bc97e826cc
file format: qcow2
virtual size: 1.5 TiB (1649267441664 bytes)
disk size: 0 B <================================
cluster_size: 65536
Format specific information:
compat: 1.1
compression type: zlib
lazy refcounts: true
refcount bits: 16
corrupt: false
extended l2: false
Steps to reproduce
- Have a damaged VM with its qcow2 in an LV
- run 'qemu-img check ' verify that it properly detects the blocks which need fixing.
- run 'qemu-img check -r all ', it exits with 'no space left on device´ after a few seconds.
Additional information
https://bugzilla.redhat.com/show_bug.cgi?id=1519071
Here is one example:
Edited by Vincent Cojot