refactor(alerts): lower urgent-cpu-bound execution SLO
What
Lower the urgent-cpu-bound
execution SLO from 0.995 to 0.99
Why
In https://nonprod-log.gitlab.net/app/r/s/gC1II we see this alert firing a few times, and in https://dashboards.gitlab.net/d/sidekiq-main/sidekiq3a-overview?from=now-7d&to=now&var-environment=gprd&var-stage=main&var-shard=urgent-cpu-bound&orgId=1 we also see the apdex drop every hour. This is a known issue and we are working on fixing this at gitlab-org/gitlab#430782 (closed), but it's causing too many false alerts for the on-call.
Lowering the SLO to 0.99 will make it less sensitive and not page the on-call multiple times a day until the infradev is fixed.
Before | After |
---|---|
![]() source |
![]() source |
Edited by Steve Xuereb