Push error ratio SLA for pages up to 4 nines
Corrective action for gitlab-com/gl-infra/production#2379 (closed)
With the existing 99.95% SLA, it took about 10 minutes for us to use our 1h error budget during the regression introduced in todays deployment:
We can tune the error budget for this service to 99.99%:"
With a 99.99% error rate SLA, no alerts over the past two weeks:
With the 99.99% error budget, the alert would have fired 3 minutes after the errors started occurring, at 14h22:
Error burn rate dashboard link
Edited by Andrew Newdigate