Draft: fix(engine,stats): add libvirt overload protection with timeouts and backoff

fix(engine,stats): add libvirt overload protection with timeouts and backoff

Engine changes:

  • Add update_hyp_cap_status() to disable/enable capabilities independently
  • Add timeout protection to isAlive() check (5s timeout)
  • Disable capabilities when libvirt is unresponsive, re-enable on recovery
  • Block UI events enqueuing when hypervisor capabilities are disabled
  • Add detailed logging with LIBVIRT prefixes for monitoring

Stats changes:

  • Fix domain resource leak (uncommented Free() calls)
  • Add exponential backoff (2s to 5min) for libvirt failures
  • Add timeout wrappers for ListAllDomains and GetAllDomainStats (10s)
  • Reduce MaxRequestsInFlight from 40 to 5 to prevent mutex contention
  • Add detailed logging with LIBVIRT prefixes for monitoring

Log prefixes for grep monitoring:

  • LIBVIRT TIMEOUT: Operation exceeded timeout
  • LIBVIRT SLOW: Operation completed but slow
  • LIBVIRT BACKOFF INCREASED: Backoff increased after failure
  • LIBVIRT RECOVERED: System recovered after failures

Merge request reports

Loading