Skip to content

fix(gitaly): reduce apdex SLO from 0.999 to 0.97

Steve Xuereb requested to merge fix/gitaly-apdex into master

What

Set the apdex from 99.9% to 97%

Why

Looking at the last month pages/incident for Gitaly it was one of the busiest services. Most of these incidents resolved on their own and were severity3 or severity4.

This is adding a lot of burden to our on-call and burning them out, we have gitlab-com/gl-infra&991 (closed) to address some of the stability issues.

In https://gitlab.com/gitlab-com/gl-infra/reliability/-/issues/23576#note_1379544308 we can see how setting the apdex to 0.97 wouldn't have paged the on-call for a severity4 incident.

  1. 2023-05-05: Apdex dip on file-38-stor-gprd (gitlab-com/gl-infra/production#11337 - closed): Would not alert

    Screenshot_2023-05-05_at_11.14.55

    source

  2. 2023-05-04: GitalyServiceGoserverApdexSLOViolat... (gitlab-com/gl-infra/production#11147 - closed): Would not alert

    Screenshot_2023-05-05_at_11.15.53

    source

  3. 2023-05-03: Gitaly apdex dip on node file-38 (gitlab-com/gl-infra/production#10861 - closed): Would not alert Screenshot_2023-05-05_at_11.17.03

    source

  4. 2023-04-28: GitalyServiceGoserverApdexSLOViolat... (gitlab-com/gl-infra/production#9641 - closed): Would not alert

    Screenshot_2023-05-05_at_11.18.08

    source

Edited by Steve Xuereb

Merge request reports