libvirtd 100% busy in polling on eventfd

Software environment

  • Operating system: Ubuntu 22.04
  • Architecture: amd64
  • kernel version: 5.11.0-16-generic
  • libvirt version: 7.6.0-0ubuntu1
  • Hypervisor and version: 1:6.0+dfsg-2expubuntu2

Description of problem

I was asked to look after a system which had libvirt consuming a full cpu by 100%. But I couldn't pinpoint a root cause yet.

I see in an strace that is is busy with poll/read/write

14:40:10 (+     0.000118) poll([{fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=5, events=POLLIN}, {fd=8, events=POLLIN}, {fd=10, events=POLLIN}, {fd=12, events=POLLIN}, {fd=13, events=POLLIN}, {fd=24, events=POLLIN}], 8, 0) = 2 ([{fd=10, revents=POLLIN}, {fd=24, revents=POLLIN}]) <0.000015>
14:40:10 (+     0.000069) read(10, "\2\0\0\0\0\0\0\0", 16) = 8 <0.000013>
14:40:10 (+     0.000052) write(10, "\1\0\0\0\0\0\0\0", 8) = 8 <0.000013>
14:40:10 (+     0.000052) write(10, "\1\0\0\0\0\0\0\0", 8) = 8 <0.000019>
14:40:10 (+     0.000124) poll([{fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=5, events=POLLIN}, {fd=8, events=POLLIN}, {fd=10, events=POLLIN}, {fd=12, events=POLLIN}, {fd=13, events=POLLIN}, {fd=24, events=POLLIN}], 8, 0) = 2 ([{fd=10, revents=POLLIN}, {fd=24, revents=POLLIN}]) <0.000024>
14:40:10 (+     0.000066) read(10, "\2\0\0\0\0\0\0\0", 16) = 8 <0.000011>
14:40:10 (+     0.000041) write(10, "\1\0\0\0\0\0\0\0", 8) = 8 <0.000011>
14:40:10 (+     0.000041) write(10, "\1\0\0\0\0\0\0\0", 8) = 8 <0.000010>
14:40:10 (+     0.000101) poll([{fd=3, events=POLLIN}, {fd=4, events=POLLI

This loops and the Fd does not change. The FD points to eventfd

$ sudo ls -laF /proc/3677753/fd/10
lrwx------ 1 root root 64 Nov 18 14:39 /proc/3677753/fd/10 -> 'anon_inode:[eventfd]'

Despite debug symbols installed I can't get to it in gdb as on this backtrace plenty of symbols are just not mappable.

Thread 1 "libvirtd" hit Catchpoint 1 (call to syscall poll), 0x00007ff9e812ea9f in fts_read (sp=0x7ff9e850b170 <g_clear_pointer+16>) at ../sysdeps/wordsize-64/../../io/fts.c:486
486	in ../sysdeps/wordsize-64/../../io/fts.c
(gdb) bt
#0  0x00007ff9e812ea9f in fts_read (sp=0x7ff9e850b170 <g_clear_pointer+16>) at ../sysdeps/wordsize-64/../../io/fts.c:486
#1  0x000055d40e525740 in ?? ()
#2  0x00007ffc2688c0e0 in ?? ()
#3  0x000055d40e5443e0 in ?? ()
#4  0x000055d40d72d008 in ?? ()
#5  0x000055d40e54d820 in ?? ()
#6  0x0000000000000007 in ?? ()
#7  0x000055d40d6f87e2 in main (argc=<optimized out>, argv=<optimized out>) at ../../src/remote/remote_daemon.c:1213

I might be looking for the wrong keywords but the best match I found was only a 11 year old mailing list entry here. Other cases seemed close but are for qemus eventfd.

The Journal and log does not hold anything useful either (only the occasional DHCPACK/REFRESH) and I'm afraid the service restart to enabling better logging will remove the issue and thereby any chance to debug it further.

And other than these past issues I managed to get the debug symbols installed and could debug if only I'd have a better guidance what to look for.

Steps to reproduce

Sorry, I have no idea how to recreate it - just having an active system which gladly is non production and I can at least debug on.

Additional information

As I said, maybe I was looking for the wrong keywords, if there are issues/examples to read in regard to this please let me know. Other than that I'm mostly looking for "have you checked this" advice hopefully getting enough data to identify the root cause eventually.