Draft: fix(engine,stats): add libvirt overload protection with timeouts and backoff
fix(engine,stats): add libvirt overload protection with timeouts and backoff
Engine changes:
- Add update_hyp_cap_status() to disable/enable capabilities independently
- Add timeout protection to isAlive() check (5s timeout)
- Disable capabilities when libvirt is unresponsive, re-enable on recovery
- Block UI events enqueuing when hypervisor capabilities are disabled
- Add detailed logging with LIBVIRT prefixes for monitoring
Stats changes:
- Fix domain resource leak (uncommented Free() calls)
- Add exponential backoff (2s to 5min) for libvirt failures
- Add timeout wrappers for ListAllDomains and GetAllDomainStats (10s)
- Reduce MaxRequestsInFlight from 40 to 5 to prevent mutex contention
- Add detailed logging with LIBVIRT prefixes for monitoring
Log prefixes for grep monitoring:
- LIBVIRT TIMEOUT: Operation exceeded timeout
- LIBVIRT SLOW: Operation completed but slow
- LIBVIRT BACKOFF INCREASED: Backoff increased after failure
- LIBVIRT RECOVERED: System recovered after failures