Rename SLOs we use in saturation points
In our saturation points we currently define 2 SLOs: https://gitlab.com/gitlab-com/runbooks/blob/049e8712d8a77c0e18a038d7d69d8b1e591b1002/libsonnet/servicemetrics/resource_saturation_point.libsonnet#L41-42
-
soft
: Since tamland#23 this is used for capacity planning. A capacity planning issue will be created when Tamland forecasts breaching this threshold. -
hard
: When breaching this threshold, we'll alert the SRE-oncall, depending on the severity of the saturation point this will go through pagerduty.
Proposal
Rename these to be more descriptive of how they are used:
-
soft
tocapacity-planning
-
hard
toalerting
We'll need to change these in the runbooks in the definition linked above, as well as in all the saturation points.
In Tamland, we'll need to change the names of the thresholds we want to include: https://gitlab.com/gitlab-com/gl-infra/tamland/blob/f33c29c2cfae56d1416f3d909ba4f2d96fb15189/metadata.py#L78
(Suggestion by @andrewn, during the team call 2023-02-01)
-
Before rolling out a change to the SLOs, communicate this through the SaaS weekly call and more widely.
Edited by Andreas Brandl