Skip to content

block-stream segfault with concurrent query-named-block-nodes

[ Impact ]

When running block-stream and query-named-block-nodes concurrently, a null-pointer dereference causes QEMU to segfault.

This occurs in every version of QEMU shipped with Ubuntu, 22.04 thru 25.10. I have not yet reproduced the bug using an upstream build.

It is reproducible in 10.1.0:

$ apt-cache policy qemu-system-x86
qemu-system-x86:
  Installed: 1:10.1.0+ds-5ubuntu2
  Candidate: 1:10.1.0+ds-5ubuntu2
  Version table:
 *** 1:10.1.0+ds-5ubuntu2 500
        500 http://archive.ubuntu.com/ubuntu questing/main amd64 Packages
        100 /var/lib/dpkg/status

The bug is reported in Ubuntu as LP: #2126951

[ Reproducer ]

In query-named-block-nodes.sh:

#!/bin/bash

while true; do
    virsh qemu-monitor-command "$1" query-named-block-nodes > /dev/null
done

In blockrebase-crash.sh:

#!/bin/bash

set -ex

domain="$1"

if [ -z "${domain}" ]; then
    echo "Missing domain name"
    exit 1
fi

./query_named_block_nodes.sh "${domain}" &
query_pid=$!

while [ -n "$(virsh list --uuid)" ]; do
    snap="snap0-$(uuidgen)"

    virsh snapshot-create-as "${domain}" \
        --name "${snap}" \
        --disk-only file= \
        --diskspec vda,snapshot=no \
        --diskspec "vdb,stype=file,file=/var/lib/libvirt/images/n0-blk0_${snap}.qcow2" \
        --atomic \
        --no-metadata

    virsh blockpull "${domain}" vdb

    while bjr=$(virsh blockjob "$domain" vdb); do
        if [[ "$bjr" == *"No current block job for"* ]] ; then
            break;
        fi;
    done;
done

kill "${query_pid}"

Provision (Ctrl + ] after boot):

wget https://cloud-images.ubuntu.com/daily/server/noble/current/noble-server-cloudimg-amd64.img

sudo cp noble-server-cloudimg-amd64.img /var/lib/libvirt/images/n0-root.qcow2
sudo qemu-img create -f qcow2 /var/lib/libvirt/images/n0-blk0.qcow2 10G

touch network-config
touch meta-data
touch user-data

virt-install \
  -n n0 \
  --description "Test noble minimal" \
  --os-variant=ubuntu24.04 \
  --ram=1024 --vcpus=2 \
  --import \
  --disk path=/var/lib/libvirt/images/n0-root.qcow2,bus=virtio,cache=writethrough,size=10 \
  --disk path=/var/lib/libvirt/images/n0-blk0.qcow2,bus=virtio,cache=writethrough,size=10 \
  --graphics none \
  --network network=default \
  --cloud-init user-data="user-data,meta-data=meta-data,network-config=network-config"

And run the script to cause the crash (you may need to manually kill query-named-block-jobs.sh):

./blockrebase-crash n0

[ Details ]

Backtrace from the coredump (source at [1]):

#0  bdrv_refresh_filename (bs=0x5efed72f8350) at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/block.c:8082
#1  0x00005efea73cf9dc in bdrv_block_device_info (blk=0x0, bs=0x5efed72f8350, flat=true, errp=0x7ffeb829ebd8)
    at block/qapi.c:62
#2  0x00005efea7391ed3 in bdrv_named_nodes_list (flat=<optimized out>, errp=0x7ffeb829ebd8)
    at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/block.c:6275
#3  0x00005efea7471993 in qmp_query_named_block_nodes (has_flat=<optimized out>, flat=<optimized out>,
    errp=0x7ffeb829ebd8) at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/blockdev.c:2834
#4  qmp_marshal_query_named_block_nodes (args=<optimized out>, ret=0x7f2b753beec0, errp=0x7f2b753beec8)
    at qapi/qapi-commands-block-core.c:553
#5  0x00005efea74f03a5 in do_qmp_dispatch_bh (opaque=0x7f2b753beed0) at qapi/qmp-dispatch.c:128
#6  0x00005efea75108e6 in aio_bh_poll (ctx=0x5efed6f3f430) at util/async.c:219
#7  0x00005efea74ffdb2 in aio_dispatch (ctx=0x5efed6f3f430) at util/aio-posix.c:436
#8  0x00005efea7512846 in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>,
    user_data=<optimized out>) at util/async.c:361
#9  0x00007f2b77809bfb in ?? () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#10 0x00007f2b77809e70 in g_main_context_dispatch () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#11 0x00005efea7517228 in glib_pollfds_poll () at util/main-loop.c:287
#12 os_host_main_loop_wait (timeout=0) at util/main-loop.c:310
#13 main_loop_wait (nonblocking=<optimized out>) at util/main-loop.c:589
#14 0x00005efea7140482 in qemu_main_loop () at system/runstate.c:905
#15 0x00005efea744e4e8 in qemu_default_main (opaque=opaque@entry=0x0) at system/main.c:50
#16 0x00005efea6e76319 in main (argc=<optimized out>, argv=<optimized out>) at system/main.c:93

The libvirt logs suggest that the crash occurs right at the end of the blockjob, since it reaches "concluded" state before crashing. I assume that this is one of:

  • stream_clean is freeing/modifying the cor_filter_bs without holding a lock that it needs to [2][3]
  • bdrv_refresh_filename needs to handle the possibility that the QLIST of children for a filter bs could be NULL [1]

I am working on understanding this problem enough to put together a patch, but if someone has any sense of what the ideal way forward is I'd appreciate some guidance.

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information