VM guest process freezes when accessing virtiofs mount, requiring VM forced shutdown and restart
I have a problem with guest processes in a user-session QEMU/KVM virtual machine freezing. The frequency is about once every day or two. I'm observing this on a linux mint xfce 22.1 guest running on a cachyOS+wayfire host running Linux 6.16.0-5-cachyos. In the past, I have frequently observed apparently-identical behavior in an ubuntu guest running on an ubuntu host.
I notice this most often with thunderbird. A stack trace of a frozen thunderbird yields:
[<0>] request_wait_answer+0xda/0x2a0
[<0>] fuse_simple_request+0x18d/0x2f0
[<0>] fuse_do_getattr+0xf0/0x220
[<0>] fuse_permission+0x126/0x1f0
[<0>] inode_permission+0xfc/0x1b0
...
It's trying to access files in its profile, which is located in a directory shared with the host via virtiofs. Once thunderbird is frozen, any other guest process (like df or thunar) that accesses the shared directory will also hang. The only remedy I've found is doing a forced shutdown and restart of the VM. I have another nearly-identical VM that differs only in that it has thunderbird's profile in a non-shared directory in the guest, and it has never exhibited the problem.
Here is the XML for the shared directory:
<filesystem type='mount' accessmode='passthrough'>
<driver type='virtiofs'/>
<binary path='/PATH-TO-WRAPPER-AROUND-VIRTIOFSD-THAT-I-COMPILED-FROM-SOURCE'/>
<idmap>
<uid start='1000' target='1000' count='1'/>
<gid start='1000' target='1000' count='1'/>
</idmap>
<source dir='/XXX'/>
<target dir='YYY'/>
<alias name='fs1'/>
<address type='pci' domain='0x0000' bus='0x08' slot='0x00' function='0x0'/>
</filesystem>
I first observed this using the distro's virtiofsd, but I'm currently running a virtiofsd built from rust source cloned from here 2 or 3 weeks ago. I'm invoking it through a wrapper and capturing logs. From the moment I notice a frozen guest process, I never see anything added to the log, leading me to suspect that the virtiofsd is also frozen.
Here is a stack trace of virtiofsd in the apparently-frozen state:
Thread 3 (Thread 0x7f8017bff6c0 (LWP 66895) "vring_worker"):
#0 0x00007f8017cb0872 in ?? () from /usr/lib/libc.so.6
#1 0x00007f8017ca42d8 in ?? () from /usr/lib/libc.so.6
#2 0x00007f8017d3b131 in epoll_wait () from /usr/lib/libc.so.6
#3 0x000055ac8b872ba0 in vmm_sys_util::linux::epoll::Epoll::wait (self=0x55aca01848d8, timeout=-1, events=...) at src/linux/epoll.rs:310
#4 0x000055ac8b345849 in vhost_user_backend::event_loop::VringEpollHandler<alloc::sync::Arc<virtiofsd::vhost_user::VhostUserFsBackend<virtiofsd::passthrough::PassthroughFs>, alloc::alloc::Global>>::run<alloc::sync::Arc<virtiofsd::vhost_user::VhostUserFsBackend<virtiofsd::passthrough::PassthroughFs>, alloc::alloc::Global>> (self=0x55aca01848b0) at /home/george/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/vhost-user-backend-0.17.0/src/event_loop.rs:156
#5 0x000055ac8b36bf60 in vhost_user_backend::handler::{impl#2}::new::{closure#0}<alloc::sync::Arc<virtiofsd::vhost_user::VhostUserFsBackend<virtiofsd::passthrough::PassthroughFs>, alloc::alloc::Global>> () at /home/george/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/vhost-user-backend-0.17.0/src/handler.rs:147
#6 0x000055ac8b3f503e in std::sys::backtrace::__rust_begin_short_backtrace<vhost_user_backend::handler::{impl#2}::new::{closure_env#0}<alloc::sync::Arc<virtiofsd::vhost_user::VhostUserFsBackend<virtiofsd::passthrough::PassthroughFs>, alloc::alloc::Global>>, core::result::Result<(), vhost_user_backend::event_loop::VringEpollError>> (f=...) at /usr/src/debug/rust/rustc-1.88.0-src/library/std/src/sys/backtrace.rs:152
#7 0x000055ac8b429ae7 in std::thread::{impl#0}::spawn_unchecked_::{closure#1}::{closure#0}<vhost_user_backend::handler::{impl#2}::new::{closure_env#0}<alloc::sync::Arc<virtiofsd::vhost_user::VhostUserFsBackend<virtiofsd::passthrough::PassthroughFs>, alloc::alloc::Global>>, core::result::Result<(), vhost_user_backend::event_loop::VringEpollError>> () at /usr/src/debug/rust/rustc-1.88.0-src/library/std/src/thread/mod.rs:559
#8 0x000055ac8b412158 in core::panic::unwind_safe::{impl#23}::call_once<core::result::Result<(), vhost_user_backend::event_loop::VringEpollError>, std::thread::{impl#0}::spawn_unchecked_::{closure#1}::{closure_env#0}<vhost_user_backend::handler::{impl#2}::new::{closure_env#0}<alloc::sync::Arc<virtiofsd::vhost_user::VhostUserFsBackend<virtiofsd::passthrough::PassthroughFs>, alloc::alloc::Global>>, core::result::Result<(), vhost_user_backend::event_loop::VringEpollError>>> (self=...) at /usr/src/debug/rust/rustc-1.88.0-src/library/core/src/panic/unwind_safe.rs:272
#9 0x000055ac8b41a4f3 in std::panicking::try::do_call<core::panic::unwind_safe::AssertUnwindSafe<std::thread::{impl#0}::spawn_unchecked_::{closure#1}::{closure_env#0}<vhost_user_backend::handler::{impl#2}::new::{closure_env#0}<alloc::sync::Arc<virtiofsd::vhost_user::VhostUserFsBackend<virtiofsd::passthrough::PassthroughFs>, alloc::alloc::Global>--Type <RET> for more, q to quit, c to continue without paging--c
>, core::result::Result<(), vhost_user_backend::event_loop::VringEpollError>>>, core::result::Result<(), vhost_user_backend::event_loop::VringEpollError>> (data=0x7f8017bfeb98) at /usr/src/debug/rust/rustc-1.88.0-src/library/std/src/panicking.rs:589
#10 0x000055ac8b42b2fb in __rust_try ()
#11 0x000055ac8b4250fd in std::panicking::try<core::result::Result<(), vhost_user_backend::event_loop::VringEpollError>, core::panic::unwind_safe::AssertUnwindSafe<std::thread::{impl#0}::spawn_unchecked_::{closure#1}::{closure_env#0}<vhost_user_backend::handler::{impl#2}::new::{closure_env#0}<alloc::sync::Arc<virtiofsd::vhost_user::VhostUserFsBackend<virtiofsd::passthrough::PassthroughFs>, alloc::alloc::Global>>, core::result::Result<(), vhost_user_backend::event_loop::VringEpollError>>>> (f=...) at /usr/src/debug/rust/rustc-1.88.0-src/library/std/src/panicking.rs:552
#12 std::panic::catch_unwind<core::panic::unwind_safe::AssertUnwindSafe<std::thread::{impl#0}::spawn_unchecked_::{closure#1}::{closure_env#0}<vhost_user_backend::handler::{impl#2}::new::{closure_env#0}<alloc::sync::Arc<virtiofsd::vhost_user::VhostUserFsBackend<virtiofsd::passthrough::PassthroughFs>, alloc::alloc::Global>>, core::result::Result<(), vhost_user_backend::event_loop::VringEpollError>>>, core::result::Result<(), vhost_user_backend::event_loop::VringEpollError>> (f=...) at /usr/src/debug/rust/rustc-1.88.0-src/library/std/src/panic.rs:359
#13 std::thread::{impl#0}::spawn_unchecked_::{closure#1}<vhost_user_backend::handler::{impl#2}::new::{closure_env#0}<alloc::sync::Arc<virtiofsd::vhost_user::VhostUserFsBackend<virtiofsd::passthrough::PassthroughFs>, alloc::alloc::Global>>, core::result::Result<(), vhost_user_backend::event_loop::VringEpollError>> () at /usr/src/debug/rust/rustc-1.88.0-src/library/std/src/thread/mod.rs:557
#14 0x000055ac8b3523ee in core::ops::function::FnOnce::call_once<std::thread::{impl#0}::spawn_unchecked_::{closure_env#1}<vhost_user_backend::handler::{impl#2}::new::{closure_env#0}<alloc::sync::Arc<virtiofsd::vhost_user::VhostUserFsBackend<virtiofsd::passthrough::PassthroughFs>, alloc::alloc::Global>>, core::result::Result<(), vhost_user_backend::event_loop::VringEpollError>>, ()> () at /usr/src/debug/rust/rustc-1.88.0-src/library/core/src/ops/function.rs:250
#15 0x000055ac8b8a67cb in std::sys::pal::unix::thread::Thread::new::thread_start ()
#16 0x00007f8017ca80d6 in ?? () from /usr/lib/libc.so.6
#17 0x00007f8017d3b05c in ?? () from /usr/lib/libc.so.6
Thread 2 (Thread 0x7f80179fe6c0 (LWP 66898) "virtiofsd-backe"):
#0 0x00007f8017cb0872 in ?? () from /usr/lib/libc.so.6
#1 0x00007f8017ca42d8 in ?? () from /usr/lib/libc.so.6
#2 0x00007f8017d3c9ed in recvmsg () from /usr/lib/libc.so.6
#3 0x000055ac8b87027d in vmm_sys_util::linux::sock_ctrl_msg::raw_recvmsg (fd=10, iovecs=..., in_fds=...) at src/linux/sock_ctrl_msg.rs:225
#4 0x000055ac8b856e8f in vmm_sys_util::linux::sock_ctrl_msg::ScmSocket::recv_with_fds<std::os::unix::net::stream::UnixStream> (self=0x7f80179fd720, iovecs=..., fds=...) at /home/george/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/vmm-sys-util-0.12.1/src/linux/sock_ctrl_msg.rs:416
#5 0x000055ac8b36efc6 in vhost::vhost_user::connection::Endpoint<vhost::vhost_user::message::VhostUserMsgHeader<vhost::vhost_user::message::FrontendReq>>::recv_into_iovec<vhost::vhost_user::message::VhostUserMsgHeader<vhost::vhost_user::message::FrontendReq>> (self=0x7f80179fd720, iovs=...) at /home/george/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/vhost-0.13.0/src/vhost_user/connection.rs:322
#6 0x000055ac8b36f798 in vhost::vhost_user::connection::Endpoint<vhost::vhost_user::message::VhostUserMsgHeader<vhost::vhost_user::message::FrontendReq>>::recv_into_iovec_all<vhost::vhost_user::message::VhostUserMsgHeader<vhost::vhost_user::message::FrontendReq>> (self=0x7f80179fd720, iovs=...) at /home/george/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/vhost-0.13.0/src/vhost_user/connection.rs:390
#7 0x000055ac8b36db5b in vhost::vhost_user::connection::Endpoint<vhost::vhost_user::message::VhostUserMsgHeader<vhost::vhost_user::message::FrontendReq>>::recv_header<vhost::vhost_user::message::VhostUserMsgHeader<vhost::vhost_user::message::FrontendReq>> (self=0x7f80179fd720) at /home/george/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/vhost-0.13.0/src/vhost_user/connection.rs:450
#8 0x000055ac8b385527 in vhost::vhost_user::backend_req_handler::BackendReqHandler<std::sync::poison::mutex::Mutex<vhost_user_backend::handler::VhostUserHandler<alloc::sync::Arc<virtiofsd::vhost_user::VhostUserFsBackend<virtiofsd::passthrough::PassthroughFs>, alloc::alloc::Global>>>>::handle_request<std::sync::poison::mutex::Mutex<vhost_user_backend::handler::VhostUserHandler<alloc::sync::Arc<virtiofsd::vhost_user::VhostUserFsBackend<virtiofsd::passthrough::PassthroughFs>, alloc::alloc::Global>>>> (self=0x7f80179fd6f0) at /home/george/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/vhost-0.13.0/src/vhost_user/backend_req_handler.rs:408
#9 0x000055ac8b392673 in vhost_user_backend::{impl#1}::start_daemon::{closure#0}<alloc::sync::Arc<virtiofsd::vhost_user::VhostUserFsBackend<virtiofsd::passthrough::PassthroughFs>, alloc::alloc::Global>> () at /home/george/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/vhost-user-backend-0.17.0/src/lib.rs:138
#10 0x000055ac8b3f4f71 in std::sys::backtrace::__rust_begin_short_backtrace<vhost_user_backend::{impl#1}::start_daemon::{closure_env#0}<alloc::sync::Arc<virtiofsd::vhost_user::VhostUserFsBackend<virtiofsd::passthrough::PassthroughFs>, alloc::alloc::Global>>, core::result::Result<(), vhost_user_backend::Error>> (f=...) at /usr/src/debug/rust/rustc-1.88.0-src/library/std/src/sys/backtrace.rs:152
#11 0x000055ac8b42a5bd in std::thread::{impl#0}::spawn_unchecked_::{closure#1}::{closure#0}<vhost_user_backend::{impl#1}::start_daemon::{closure_env#0}<alloc::sync::Arc<virtiofsd::vhost_user::VhostUserFsBackend<virtiofsd::passthrough::PassthroughFs>, alloc::alloc::Global>>, core::result::Result<(), vhost_user_backend::Error>> () at /usr/src/debug/rust/rustc-1.88.0-src/library/std/src/thread/mod.rs:559
#12 0x000055ac8b41222a in core::panic::unwind_safe::{impl#23}::call_once<core::result::Result<(), vhost_user_backend::Error>, std::thread::{impl#0}::spawn_unchecked_::{closure#1}::{closure_env#0}<vhost_user_backend::{impl#1}::start_daemon::{closure_env#0}<alloc::sync::Arc<virtiofsd::vhost_user::VhostUserFsBackend<virtiofsd::passthrough::PassthroughFs>, alloc::alloc::Global>>, core::result::Result<(), vhost_user_backend::Error>>> (self=...) at /usr/src/debug/rust/rustc-1.88.0-src/library/core/src/panic/unwind_safe.rs:272
#13 0x000055ac8b41a2c7 in std::panicking::try::do_call<core::panic::unwind_safe::AssertUnwindSafe<std::thread::{impl#0}::spawn_unchecked_::{closure#1}::{closure_env#0}<vhost_user_backend::{impl#1}::start_daemon::{closure_env#0}<alloc::sync::Arc<virtiofsd::vhost_user::VhostUserFsBackend<virtiofsd::passthrough::PassthroughFs>, alloc::alloc::Global>>, core::result::Result<(), vhost_user_backend::Error>>>, core::result::Result<(), vhost_user_backend::Error>> (data=0x7f80179fdb18) at /usr/src/debug/rust/rustc-1.88.0-src/library/std/src/panicking.rs:589
#14 0x000055ac8b42b2fb in __rust_try ()
#15 0x000055ac8b425701 in std::panicking::try<core::result::Result<(), vhost_user_backend::Error>, core::panic::unwind_safe::AssertUnwindSafe<std::thread::{impl#0}::spawn_unchecked_::{closure#1}::{closure_env#0}<vhost_user_backend::{impl#1}::start_daemon::{closure_env#0}<alloc::sync::Arc<virtiofsd::vhost_user::VhostUserFsBackend<virtiofsd::passthrough::PassthroughFs>, alloc::alloc::Global>>, core::result::Result<(), vhost_user_backend::Error>>>> (f=...) at /usr/src/debug/rust/rustc-1.88.0-src/library/std/src/panicking.rs:552
#16 std::panic::catch_unwind<core::panic::unwind_safe::AssertUnwindSafe<std::thread::{impl#0}::spawn_unchecked_::{closure#1}::{closure_env#0}<vhost_user_backend::{impl#1}::start_daemon::{closure_env#0}<alloc::sync::Arc<virtiofsd::vhost_user::VhostUserFsBackend<virtiofsd::passthrough::PassthroughFs>, alloc::alloc::Global>>, core::result::Result<(), vhost_user_backend::Error>>>, core::result::Result<(), vhost_user_backend::Error>> (f=...) at /usr/src/debug/rust/rustc-1.88.0-src/library/std/src/panic.rs:359
#17 std::thread::{impl#0}::spawn_unchecked_::{closure#1}<vhost_user_backend::{impl#1}::start_daemon::{closure_env#0}<alloc::sync::Arc<virtiofsd::vhost_user::VhostUserFsBackend<virtiofsd::passthrough::PassthroughFs>, alloc::alloc::Global>>, core::result::Result<(), vhost_user_backend::Error>> () at /usr/src/debug/rust/rustc-1.88.0-src/library/std/src/thread/mod.rs:557
#18 0x000055ac8b3525be in core::ops::function::FnOnce::call_once<std::thread::{impl#0}::spawn_unchecked_::{closure_env#1}<vhost_user_backend::{impl#1}::start_daemon::{closure_env#0}<alloc::sync::Arc<virtiofsd::vhost_user::VhostUserFsBackend<virtiofsd::passthrough::PassthroughFs>, alloc::alloc::Global>>, core::result::Result<(), vhost_user_backend::Error>>, ()> () at /usr/src/debug/rust/rustc-1.88.0-src/library/core/src/ops/function.rs:250
#19 0x000055ac8b8a67cb in std::sys::pal::unix::thread::Thread::new::thread_start ()
#20 0x00007f8017ca80d6 in ?? () from /usr/lib/libc.so.6
#21 0x00007f8017d3b05c in ?? () from /usr/lib/libc.so.6
Thread 1 (Thread 0x7f8017fe3200 (LWP 66894) "virtiofsd"):
#0 0x00007f8017cb0872 in ?? () from /usr/lib/libc.so.6
#1 0x00007f8017ca4248 in ?? () from /usr/lib/libc.so.6
#2 0x00007f8017ca49a3 in ?? () from /usr/lib/libc.so.6
#3 0x00007f8017ca9d13 in ?? () from /usr/lib/libc.so.6
#4 0x000055ac8b8a6921 in std::sys::pal::unix::thread::Thread::join ()
#5 0x000055ac8b41cbbb in std::thread::JoinInner<core::result::Result<(), vhost_user_backend::Error>>::join<core::result::Result<(), vhost_user_backend::Error>> (self=<error reading variable: Cannot access memory at address 0x109>) at /usr/src/debug/rust/rustc-1.88.0-src/library/std/src/thread/mod.rs:1742
#6 0x000055ac8b41d410 in std::thread::JoinHandle<core::result::Result<(), vhost_user_backend::Error>>::join<core::result::Result<(), vhost_user_backend::Error>> (self=...) at /usr/src/debug/rust/rustc-1.88.0-src/library/std/src/thread/mod.rs:1897
#7 0x000055ac8b392d23 in vhost_user_backend::VhostUserDaemon<alloc::sync::Arc<virtiofsd::vhost_user::VhostUserFsBackend<virtiofsd::passthrough::PassthroughFs>, alloc::alloc::Global>>::wait<alloc::sync::Arc<virtiofsd::vhost_user::VhostUserFsBackend<virtiofsd::passthrough::PassthroughFs>, alloc::alloc::Global>> (self=0x7ffff7cd85b8) at /home/george/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/vhost-user-backend-0.17.0/src/lib.rs:193
#8 0x000055ac8b34d911 in virtiofsd::run_generic_fs<virtiofsd::passthrough::PassthroughFs> (fs=..., listener=..., thread_pool_size=0, tag=...) at src/main.rs:914
#9 0x000055ac8b399c66 in virtiofsd::main () at src/main.rs:876
(gdb)
Here is a stack trace of virtiofsd:
g=...) at src/main.rs:914 #9 (closed) 0x000055ac8b399c66 in virtiofsd::main () at src/main.rs:876 (gdb)