Add alert for alertmanager alerts failing
Not sure if we want this in prefixed with alertmanager or prometheus. I went with treating it as a whole separate thing.
You can see this metric during the outage when alertmanager was not able to connect to slack for a week here: https://prometheus.gitlab.com/graph?g0.range_input=4w&g0.expr=rate(alertmanager_notifications_failed_total%5B1m%5D)%20%3E%200&g0.tab=0
Fixes https://gitlab.com/gitlab-com/infrastructure/issues/3868
Edited by Gregory Stark