qemu-storage-daemon: NBD server backend deadlock
Host environment
- Operating system: Ubuntu 25.04
- OS/kernel version: Linux Arc 6.14.11-061411-generic #202506101206 SMP PREEMPT_DYNAMIC Tue Jun 10 13:12:54 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
- Architecture: x86_64
- qemu-storage-daemon version 10.1.50 (v10.1.0-1395-g8109ebdb95) (or qemu-storage-daemon version 9.2.1 (Debian 1:9.2.1+ds-1ubuntu5.2))
qemu-storage-daemon deadlocks at attempt to add a QCOW2 disk winch have a backing NBD url pointing to the same server
Steps to reproduce:
- Terminal 1:
-
Create a root disk:
qemu-img create -f qcow2 /tmp/root.qcow2 1G -
Start a Storage Daemon instance with the root disk exported:
qemu-storage-daemon \ --nbd-server addr.type=inet,addr.host=0.0.0.0,addr.port=5000 \ --blockdev driver=qcow2,node-name=root_node,file.driver=file,file.filename=/tmp/root.qcow2 \ --export type=nbd,id=root_node,node-name=root_node,name=root,writable=on \ --chardev stdio,id=mon0 --monitor chardev=mon0,mode=control
-
- Terminal 2:
-
Create a root-backed snapshot. It succeeds because the backed storage is up and running:
qemu-img create -f qcow2 -F raw -b nbd://127.0.0.1:5000/root /tmp/snap.qcow2
-
- Back to the terminal 1:
-
Accept capabilities:
{"execute": "qmp_capabilities"} -
Try to attach the snapshot from the previous step:
{ "execute": "blockdev-add", "arguments": { "node-name": "snap_node", "driver": "qcow2", "file": { "driver": "file", "filename": "/tmp/snap.qcow2" } } }
-
That is it, the qemu-storage-daemon is hopelessly deadlocked. It is no longer provides any service neither via NBD, nor QMP and can't be closed via SIGINT.
Additional information
Sockets after deadlock:
tcp LISTEN 1 4096 0.0.0.0:5000 0.0.0.0:* users:(("qemu-storage-da",pid=180760,fd=9))
tcp ESTAB 0 0 127.0.0.1:5000 127.0.0.1:46858
tcp ESTAB 0 0 127.0.0.1:46858 127.0.0.1:5000 users:(("qemu-storage-da",pid=180760,fd=15))
Stack trace of the process:
Thread 4 (Thread 0x7127a9a8d6c0 (LWP 180250) "qemu-storage-da"):
#0 syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
#1 0x00005df26294f905 in qemu_futex_wait (f=0x5df262aa74f8 <rcu_call_ready_event>, val=4294967295) at /home/user/src/qemu/include/qemu/futex.h:47
#2 0x00005df26294fb39 in qemu_event_wait (ev=0x5df262aa74f8 <rcu_call_ready_event>) at ../util/event.c:162
#3 0x00005df26295adbe in call_rcu_thread (opaque=0x0) at ../util/rcu.c:278
#4 0x00005df26294eb77 in qemu_thread_start (args=0x5df28933d2c0) at ../util/qemu-thread-posix.c:393
#5 0x00007127aa0a27f1 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:448
#6 0x00007127aa133b5c in __GI___clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78
Thread 3 (Thread 0x7127a88476c0 (LWP 180252) "qemu-storage-da"):
#0 __syscall_cancel_arch () at ../sysdeps/unix/sysv/linux/x86_64/syscall_cancel.S:56
#1 0x00007127aa09eb63 in __internal_syscall_cancel (a1=<optimized out>, a2=<optimized out>, a3=<optimized out>, a4=<optimized out>, a5=8, a6=0, nr=271) at ./nptl/cancellation.c:49
#2 __syscall_cancel (a1=<optimized out>, a2=<optimized out>, a3=<optimized out>, a4=<optimized out>, a5=a5@entry=8, a6=a6@entry=0, nr=271) at ./nptl/cancellation.c:75
#3 0x00007127aa125fb6 in __GI_ppoll (fds=<optimized out>, nfds=<optimized out>, timeout=<optimized out>, sigmask=<optimized out>) at ../sysdeps/unix/sysv/linux/ppoll.c:42
#4 0x00007127aa80d245 in ?? () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#5 0x00007127aa79d157 in g_main_loop_run () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#6 0x00005df2627a40c9 in iothread_run (opaque=0x5df28935fda0) at ../iothread.c:70
#7 0x00005df26294eb77 in qemu_thread_start (args=0x5df28934b7f0) at ../util/qemu-thread-posix.c:393
#8 0x00007127aa0a27f1 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:448
#9 0x00007127aa133b5c in __GI___clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78
Thread 2 (Thread 0x71279bfbd6c0 (LWP 180258) "qemu-storage-da"):
#0 __syscall_cancel_arch () at ../sysdeps/unix/sysv/linux/x86_64/syscall_cancel.S:56
#1 0x00007127aa09eb63 in __internal_syscall_cancel (a1=<optimized out>, a2=<optimized out>, a3=<optimized out>, a4=0, a5=0, a6=0, nr=47) at ./nptl/cancellation.c:49
#2 __syscall_cancel (a1=<optimized out>, a2=<optimized out>, a3=<optimized out>, a4=a4@entry=0, a5=a5@entry=0, a6=a6@entry=0, nr=47) at ./nptl/cancellation.c:75
#3 0x00007127aa135721 in __recvmsg_syscall (fd=<optimized out>, msg=<optimized out>, flags=<optimized out>) at ../sysdeps/unix/sysv/linux/recvmsg.c:27
#4 __libc_recvmsg (fd=<optimized out>, msg=<optimized out>, flags=<optimized out>) at ../sysdeps/unix/sysv/linux/recvmsg.c:41
#5 0x00005df2628b06f3 in qio_channel_socket_readv (ioc=0x712794000b70, iov=0x5df28934fc40, niov=1, fds=0x0, nfds=0x0, flags=0, errp=0x71279bfbc910) at ../io/channel-socket.c:575
#6 0x00005df2628b6848 in qio_channel_readv_full (ioc=0x712794000b70, iov=0x5df28934fc40, niov=1, fds=0x0, nfds=0x0, flags=0, errp=0x71279bfbc910) at ../io/channel.c:75
#7 0x00005df2628b6bc4 in qio_channel_readv_full_all_eof (ioc=0x712794000b70, iov=0x71279bfbc670, niov=1, fds=0x0, nfds=0x0, flags=0, errp=0x71279bfbc910) at ../io/channel.c:159
#8 0x00005df2628b6dd7 in qio_channel_readv_full_all (ioc=0x712794000b70, iov=0x71279bfbc670, niov=1, fds=0x0, nfds=0x0, errp=0x71279bfbc910) at ../io/channel.c:227
#9 0x00005df2628b6a8e in qio_channel_readv_all (ioc=0x712794000b70, iov=0x71279bfbc670, niov=1, errp=0x71279bfbc910) at ../io/channel.c:127
#10 0x00005df2628b72d5 in qio_channel_read_all (ioc=0x712794000b70, buf=0x71279bfbc7b8 "", buflen=8, errp=0x71279bfbc910) at ../io/channel.c:348
#11 0x00005df2627c9964 in nbd_read (ioc=0x712794000b70, buffer=0x71279bfbc7b8, size=8, desc=0x5df2629a181c "initial magic", errp=0x71279bfbc910) at /home/user/src/qemu/include/block/nbd.h:443
#12 0x00005df2627c9afc in nbd_read64 (ioc=0x712794000b70, val=0x71279bfbc7b8, desc=0x5df2629a181c "initial magic", errp=0x71279bfbc910) at /home/user/src/qemu/include/block/nbd.h:470
#13 0x00005df2627cc119 in nbd_start_negotiate (ioc=0x712794000b70, tlscreds=0x0, hostname=0x0, outioc=0x5df2893b7fb0, max_mode=NBD_MODE_EXTENDED, zeroes=0x71279bfbc84a, errp=0x71279bfbc910) at ../nbd/client.c:921--Type <RET> for more, q to quit, c to continue without paging--c
#14 0x00005df2627cc6e9 in nbd_receive_negotiate (ioc=0x712794000b70, tlscreds=0x0, hostname=0x0, outioc=0x5df2893b7fb0, info=0x5df2893b7f50, errp=0x71279bfbc910) at ../nbd/client.c:1052
#15 0x00005df2627ce30b in nbd_connect (sioc=0x712794000b70, addr=0x5df2893894e0, info=0x5df2893b7f50, tlscreds=0x0, tlshostname=0x0, outioc=0x5df2893b7fb0, errp=0x71279bfbc910) at ../nbd/client-connection.c:152
#16 0x00005df2627ce546 in connect_thread_func (opaque=0x5df2893b7ea0) at ../nbd/client-connection.c:193
#17 0x00005df26294eb77 in qemu_thread_start (args=0x5df28934fc40) at ../util/qemu-thread-posix.c:393
#18 0x00007127aa0a27f1 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:448
#19 0x00007127aa133b5c in __GI___clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78
Thread 1 (Thread 0x7127aa21fbc0 (LWP 180249) "qemu-storage-da"):
#0 __syscall_cancel_arch () at ../sysdeps/unix/sysv/linux/x86_64/syscall_cancel.S:56
#1 0x00007127aa09eb63 in __internal_syscall_cancel (a1=<optimized out>, a2=<optimized out>, a3=<optimized out>, a4=<optimized out>, a5=8, a6=0, nr=271) at ./nptl/cancellation.c:49
#2 __syscall_cancel (a1=<optimized out>, a2=<optimized out>, a3=<optimized out>, a4=<optimized out>, a5=a5@entry=8, a6=a6@entry=0, nr=271) at ./nptl/cancellation.c:75
#3 0x00007127aa125fb6 in __GI_ppoll (fds=<optimized out>, nfds=<optimized out>, timeout=<optimized out>, sigmask=<optimized out>) at ../sysdeps/unix/sysv/linux/ppoll.c:42
#4 0x00005df262967f68 in qemu_poll_ns (fds=0x5df289366160, nfds=1, timeout=-1) at ../util/qemu-timer.c:330
#5 0x00005df26294a9d3 in fdmon_poll_wait (ctx=0x5df289365580, ready_list=0x7ffffaa0f918, timeout=-1) at ../util/fdmon-poll.c:79
#6 0x00005df26294a3cb in aio_poll (ctx=0x5df289365580, blocking=true) at ../util/aio-posix.c:730
#7 0x00005df26288b1c6 in bdrv_poll_co (s=0x7ffffaa0f9c0) at /home/user/src/qemu/block/block-gen.h:43
#8 0x00005df26288f8f7 in nbd_do_establish_connection (bs=0x5df2893b13c0, blocking=true, errp=0x7ffffaa0fab0) at block/block-gen.c:2633
#9 0x00005df2628043af in nbd_open (bs=0x5df2893b13c0, options=0x5df2893b6870, flags=24576, errp=0x7ffffaa0fab0) at ../block/nbd.c:1940
#10 0x00005df2627aaba0 in bdrv_open_driver (bs=0x5df2893b13c0, drv=0x5df262a972e0 <bdrv_nbd_unix>, node_name=0x0, options=0x5df2893b6870, open_flags=24576, errp=0x7ffffaa0fbc0) at ../block.c:1665
#11 0x00005df2627ab7f8 in bdrv_open_common (bs=0x5df2893b13c0, file=0x0, options=0x5df2893b6870, errp=0x7ffffaa0fbc0) at ../block.c:1995
#12 0x00005df2627b0fe6 in bdrv_open_inherit (filename=0x5df289389c60 "nbd://127.0.0.1:5000/root", reference=0x0, options=0x5df2893b6870, flags=40960, parent=0x5df2893aa050, child_class=0x5df262a46460 <child_of_bds>, child_role=19, parse_filename=true, errp=0x7ffffaa0fe00) at ../block.c:4178
#13 0x00005df2627afddb in bdrv_open_child_bs (filename=0x5df289389c60 "nbd://127.0.0.1:5000/root", options=0x5df2893af380, bdref_key=0x5df26299a003 "file", parent=0x5df2893aa050, child_class=0x5df262a46460 <child_of_bds>, child_role=19, allow_none=true, parse_filename=true, errp=0x7ffffaa0fe00) at ../block.c:3757
#14 0x00005df2627b0d6c in bdrv_open_inherit (filename=0x5df289389c60 "nbd://127.0.0.1:5000/root", reference=0x0, options=0x5df2893af380, flags=8192, parent=0x5df28938b470, child_class=0x5df262a46460 <child_of_bds>, child_role=8, parse_filename=true, errp=0x7ffffaa10070) at ../block.c:4123
#15 0x00005df2627afab3 in bdrv_open_backing_file (bs=0x5df28938b470, parent_options=0x5df2893907a0, bdref_key=0x5df262999ffb "backing", errp=0x7ffffaa10070) at ../block.c:3685
#16 0x00005df2627b1053 in bdrv_open_inherit (filename=0x0, reference=0x0, options=0x5df2893907a0, flags=8194, parent=0x0, child_class=0x0, child_role=0, parse_filename=true, errp=0x7ffffaa10288) at ../block.c:4190
#17 0x00005df2627b15bc in bdrv_open (filename=0x0, reference=0x0, options=0x5df28938a450, flags=0, errp=0x7ffffaa10288) at ../block.c:4273
#18 0x00005df26279b1ef in bds_tree_init (bs_opts=0x5df28938a450, errp=0x7ffffaa10288) at ../blockdev.c:679
#19 0x00005df2627a252e in qmp_blockdev_add (options=0x7ffffaa102c0, errp=0x7ffffaa10288) at ../blockdev.c:3433
#20 0x00005df2628ff10b in qmp_marshal_blockdev_add (args=0x7127a0003130, ret=0x7127a928cda8, errp=0x7127a928cda0) at qapi/qapi-commands-block-core.c:1459
#21 0x00005df2629392d5 in do_qmp_dispatch_bh (opaque=0x7127a928ce40) at ../qapi/qmp-dispatch.c:128
#22 0x00005df262960db4 in aio_bh_call (bh=0x5df28938a3a0) at ../util/async.c:172
#23 0x00005df262960f00 in aio_bh_poll (ctx=0x5df2893636e0) at ../util/async.c:219
#24 0x00005df262949a0f in aio_dispatch (ctx=0x5df2893636e0) at ../util/aio-posix.c:436
#25 0x00005df262961438 in aio_ctx_dispatch (source=0x5df2893636e0, callback=0x0, user_data=0x0) at ../util/async.c:364
#26 0x00007127aa79bde2 in ?? () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#27 0x00007127aa79c060 in g_main_context_dispatch () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#28 0x00005df262962bc1 in glib_pollfds_poll () at ../util/main-loop.c:290
#29 0x00005df262962c53 in os_host_main_loop_wait (timeout=0) at ../util/main-loop.c:313
#30 0x00005df262962d8e in main_loop_wait (nonblocking=0) at ../util/main-loop.c:592
#31 0x00005df2628d1436 in main (argc=11, argv=0x7ffffaa10768) at ../storage-daemon/qemu-storage-daemon.c:434
(gdb)
If there are two servers involved, everything works as expected:
- Terminal 1
-
Create a root disk:
qemu-img create -f qcow2 /tmp/root.qcow2 1G -
Start first server with the root disk exported:
qemu-storage-daemon --nbd-server addr.type=inet,addr.host=0.0.0.0,addr.port=5000 --blockdev driver=qcow2,node-name=root_node,file.driver=file,file.filename=/tmp/root.qcow2 --export type=nbd,id=root_node,node-name=root_node,name=root,writable=on
-
- Terminal 2
-
Create a root-backed snapshot. It succeeds because the backed storage is up and running:
qemu-img create -f qcow2 -F raw -b nbd://127.0.0.1:5000/root /tmp/snap.qcow2 -
Start second server:
qemu-storage-daemon --nbd-server addr.type=inet,addr.host=0.0.0.0,addr.port=6000 --chardev stdio,id=mon0 --monitor chardev=mon0,mode=control -
Accept capabilities:
{"execute": "qmp_capabilities"} -
Attach the snapshot from the previous step:
{ "execute": "blockdev-add", "arguments": { "node-name": "snap_node", "driver": "qcow2", "file": { "driver": "file", "filename": "/tmp/snap.qcow2" } } }
-