fix(gitaly): reduce apdex SLO from 0.999 to 0.97
What
Set the apdex from 99.9% to 97%
Why
Looking at the last month pages/incident for Gitaly it was one of the busiest services. Most of these incidents resolved on their own and were severity3 or severity4.
This is adding a lot of burden to our on-call and burning them out, we have gitlab-com/gl-infra&991 (closed) to address some of the stability issues.
In
https://gitlab.com/gitlab-com/gl-infra/reliability/-/issues/23576#note_1379544308
we can see how setting the apdex to 0.97
wouldn't have paged the
on-call for a severity4 incident.
-
2023-05-05: Apdex dip on file-38-stor-gprd (gitlab-com/gl-infra/production#11337 - closed): Would not alert
-
2023-05-04: GitalyServiceGoserverApdexSLOViolat... (gitlab-com/gl-infra/production#11147 - closed): Would not alert
-
2023-05-03: Gitaly apdex dip on node file-38 (gitlab-com/gl-infra/production#10861 - closed): Would not alert
-
2023-04-28: GitalyServiceGoserverApdexSLOViolat... (gitlab-com/gl-infra/production#9641 - closed): Would not alert