2020-05-14 Puma saturation causing 503s - Extremely high load on patroni-02
Summary
Puma saturation causing 503s
Timeline
All times UTC.
2020-05-14
- 12:59 - Incident declared from Slack
- 14:15 - We decide to replace Patroni-02 in Compute with a new node (Patroni-08), configure it properly and join it to the cluster.
- 14:30 - The steps are clear and validated with Ongres. Henri will be following this runbook.
- 15:50 - patroni-08 is already rebuilt and running chef.
- 16:50 - Still going through the chef phase - some challenges there (permissions and more)
- 18:30 - patroni-08 is finally syncing with the cluster leader, fully bootstrapped and configured with Chef.
- 19:35 - the base-backup is still being recovered in the server.
Details
.
Source
Incident declared by t4cc0re in Slack via /incident declare
command.
Resources
- If the Situation Zoom room was utilised, recording will be automatically uploaded to Incident room Google Drive folder (private)
Edited by Alberto Ramos