Skip to content

feat: remove High503ErrorRate alert

Steve Xuereb requested to merge feat/remove-high-503-error-rate into master

What

Remove the High503ErrorRate alert which was rate based.

Why

Using rate based alerts gets noisey and unactionable as explaned in detail in https://sre.google/workbook/alerting-on-slos/#2-increased-alert-window.

If at the last page we see below that it alerted because of a brief spike and recovered on it's own in 5 minutes. It wasn't very actionable.

Screenshot_2022-08-05_at_15.19.21

Source

This paged around 27 times last 6 months, it was either a blip or there was a real SLO/SLI that actually fired or it resolved on it's own within minutes for example:

Merge request reports