Monitoring does not seem to detect when a PostgreSQL replica crashes

During the process of troubleshooting a production support incident (gitlab-com/gl-infra/production#735 (closed)), I was killing queries on each of the replica nodes. I made an error on patroni-01, and the server process crashed. Patroni seemed to restart it, and after about half a minute it was operational again. However, I'm concerned that we didn't get an alert for this event.

Assignee Loading
Time tracking Loading