2019-12-12: A spike of errors due to pgbouncer on replicas maxing out on client connections
Summary
A failover event has been initiated, the leader at the time (patroni-06) has acquired the session lock and thus no actual failover happened but all replicas were restarted to follow the (perceived) new leader. The local pgbouncers reported errors during the period when Postgres was coming up.
Timeline
All times UTC.
2019-12-12
- 07:31 - We're alerted about an increase of error rate across the fleet
- 07:34 - We're alerted about pgbouncer max_client_conn
- 07:37 - Alerts started resolving
Edited by Ahmad Sherif