Exclude some sidekiq jobs with high client error rates from error ratio alerting
A "client error" in this context is one which can be caused by client misconfiguration, and cannot be disambiguated from server errors.
Examples are:
- Possibly misconfigured Kubernetes URLs: production#2438 (closed)
- Misconfigured GitHub URLs / credentials: gitlab-org/gitlab#30996 (closed)
We could exclude such jobs errors from contributing to service-level error ratio calculations. We'll still have to silence queue-level alerts one-by-one.
We should also file issues to fix the problems in the application itself.
Suggested by @andrewn on production#2438 (closed).
cc @gitlab-com/gl-infra/sre-observability
Edited by Craig Furman