2020-03-10 Alertmanager is failing to send notifications
Summary
This is another occurrence of #1745 (closed)
Alertmanager is failing to send notifications to two endpoints: Slackline and gitlab.com . Selected log entries from /var/log/prometheus/alertmanager/current
on alerts-02-inf-gprd.c.gitlab-production.internal
:
2020-03-10_08:43:12.70743 level=error ts=2020-03-10T08:43:12.707Z caller=dispatch.go:301 component=dispatcher msg="Notify for alerts failed" num_alerts=1 err="cancelling notify retry for \"webhook\" due to unrecoverable error: unexpected status code 422: https://gitlab.com/gitlab-com/gl-infra/infrastructure/prometheus/alerts/notify.json"
2020-03-10_08:50:22.25130 level=error ts=2020-03-10T08:50:22.251Z caller=dispatch.go:301 component=dispatcher msg="Notify for alerts failed" num_alerts=1 err="cancelling notify retry for \"webhook\" due to unrecoverable error: unexpected status code 408: https://us-central1-gitlab-infra-automation.cloudfunctions.net/alertManagerBridge"
Timeline
All times UTC.
2020-03-10
- 08:29 - EOC is paged
- 08:54 - the alert is resolved
Resources
- If the Situation Zoom room was utilised, recording will be automatically uploaded to Incident room Google Drive folder (private)
Edited by Craig Miskell