Alerts going to /dev/null
Today we had an alert for PostgreSQL_TooManyDeadTuples which didn't appear in #alerts #production or #database. I have no idea where it went.
This alert is defined as:
- alert: PostgreSQL_TooManyDeadTuples
expr: pg_stat_table_n_dead_tup > 50000 unless ON(instance) (pg_replication_is_replica
== 1)
for: 1h
labels:
severity: critical
annotations:
description: "Too many dead tuples"
table: "{{$labels.table_name}}"
dead_tuples: "{{$value}}"
runbook: "troubleshooting/postgres.md#tables-with-a-large-amount-of-dead-tuples"
title: PostgreSQL dead tuples is too large
So I would have actually expected it to go to pagerduty and #production which probably isn't appropriate. There are bigger picture questions about what our alerting strategy should be but I am concerned about the smaller question of why this alert didn't go anywhere and are there other alerts in this situation.