2025-08-29: Several Chef managed VMs cannot converge due to errors
Several Chef managed VMs cannot converge due to errors (Severity 3 (Medium))
Problem: Several Chef managed VMs could not converge due to authorization errors and unexpected files or permissions in /etc/chef.
Impact: Several VMs in GSTG and GPRD environments could not converge with Chef, affecting hosts running key services like gitaly and redis. All previously affected VMs except for one database node in GPRD have now been repaired. Only this last database VM remains impacted.
Causes: Chef processes did not properly clean up existing clients and nodes, leading to conflicts when VMs attempted to re-register after a reboot.
Response strategy: We manually removed stale Chef clients and nodes, then re-registered the VMs and restored Chef convergence on most affected hosts.
This ticket was created to track INC-3611, by incident.io