block-stream segfault with concurrent query-named-block-nodes

[ Impact ]

When running block-stream and query-named-block-nodes concurrently, a null-pointer dereference causes QEMU to segfault.

This occurs in every version of QEMU shipped with Ubuntu, 22.04 thru 25.10. I have not yet reproduced the bug using an upstream build.

It is reproducible in 10.1.0:

$ apt-cache policy qemu-system-x86
qemu-system-x86:
  Installed: 1:10.1.0+ds-5ubuntu2
  Candidate: 1:10.1.0+ds-5ubuntu2
  Version table:
 *** 1:10.1.0+ds-5ubuntu2 500
        500 http://archive.ubuntu.com/ubuntu questing/main amd64 Packages
        100 /var/lib/dpkg/status

The bug is reported in Ubuntu as LP: #2126951

[ Reproducer ]

In query-named-block-nodes.sh:

#!/bin/bash

while true; do
    virsh qemu-monitor-command "$1" query-named-block-nodes > /dev/null
done

In blockrebase-crash.sh:

#!/bin/bash

set -ex

domain="$1"

if [ -z "${domain}" ]; then
    echo "Missing domain name"
    exit 1
fi

./query_named_block_nodes.sh "${domain}" &
query_pid=$!

while [ -n "$(virsh list --uuid)" ]; do
    snap="snap0-$(uuidgen)"

    virsh snapshot-create-as "${domain}" \
        --name "${snap}" \
        --disk-only file= \
        --diskspec vda,snapshot=no \
        --diskspec "vdb,stype=file,file=/var/lib/libvirt/images/n0-blk0_${snap}.qcow2" \
        --atomic \
        --no-metadata

    virsh blockpull "${domain}" vdb

    while bjr=$(virsh blockjob "$domain" vdb); do
        if [[ "$bjr" == *"No current block job for"* ]] ; then
            break;
        fi;
    done;
done

kill "${query_pid}"

Provision (Ctrl + ] after boot):

wget https://cloud-images.ubuntu.com/daily/server/noble/current/noble-server-cloudimg-amd64.img

sudo cp noble-server-cloudimg-amd64.img /var/lib/libvirt/images/n0-root.qcow2
sudo qemu-img create -f qcow2 /var/lib/libvirt/images/n0-blk0.qcow2 10G

touch network-config
touch meta-data
touch user-data

virt-install \
  -n n0 \
  --description "Test noble minimal" \
  --os-variant=ubuntu24.04 \
  --ram=1024 --vcpus=2 \
  --import \
  --disk path=/var/lib/libvirt/images/n0-root.qcow2,bus=virtio,cache=writethrough,size=10 \
  --disk path=/var/lib/libvirt/images/n0-blk0.qcow2,bus=virtio,cache=writethrough,size=10 \
  --graphics none \
  --network network=default \
  --cloud-init user-data="user-data,meta-data=meta-data,network-config=network-config"

And run the script to cause the crash (you may need to manually kill query-named-block-jobs.sh):

./blockrebase-crash n0

[ Details ]

Backtrace from the coredump (source at [1]):

#0  bdrv_refresh_filename (bs=0x5efed72f8350) at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/block.c:8082
#1  0x00005efea73cf9dc in bdrv_block_device_info (blk=0x0, bs=0x5efed72f8350, flat=true, errp=0x7ffeb829ebd8)
    at block/qapi.c:62
#2  0x00005efea7391ed3 in bdrv_named_nodes_list (flat=<optimized out>, errp=0x7ffeb829ebd8)
    at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/block.c:6275
#3  0x00005efea7471993 in qmp_query_named_block_nodes (has_flat=<optimized out>, flat=<optimized out>,
    errp=0x7ffeb829ebd8) at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/blockdev.c:2834
#4  qmp_marshal_query_named_block_nodes (args=<optimized out>, ret=0x7f2b753beec0, errp=0x7f2b753beec8)
    at qapi/qapi-commands-block-core.c:553
#5  0x00005efea74f03a5 in do_qmp_dispatch_bh (opaque=0x7f2b753beed0) at qapi/qmp-dispatch.c:128
#6  0x00005efea75108e6 in aio_bh_poll (ctx=0x5efed6f3f430) at util/async.c:219
#7  0x00005efea74ffdb2 in aio_dispatch (ctx=0x5efed6f3f430) at util/aio-posix.c:436
#8  0x00005efea7512846 in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>,
    user_data=<optimized out>) at util/async.c:361
#9  0x00007f2b77809bfb in ?? () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#10 0x00007f2b77809e70 in g_main_context_dispatch () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#11 0x00005efea7517228 in glib_pollfds_poll () at util/main-loop.c:287
#12 os_host_main_loop_wait (timeout=0) at util/main-loop.c:310
#13 main_loop_wait (nonblocking=<optimized out>) at util/main-loop.c:589
#14 0x00005efea7140482 in qemu_main_loop () at system/runstate.c:905
#15 0x00005efea744e4e8 in qemu_default_main (opaque=opaque@entry=0x0) at system/main.c:50
#16 0x00005efea6e76319 in main (argc=<optimized out>, argv=<optimized out>) at system/main.c:93

The libvirt logs suggest that the crash occurs right at the end of the blockjob, since it reaches "concluded" state before crashing. I assume that this is one of:

  • stream_clean is freeing/modifying the cor_filter_bs without holding a lock that it needs to [2][3]
  • bdrv_refresh_filename needs to handle the possibility that the QLIST of children for a filter bs could be NULL [1]

I am working on understanding this problem enough to put together a patch, but if someone has any sense of what the ideal way forward is I'd appreciate some guidance.