On GitLab.com, the `Clusters::Cleanup::ServiceAccountWorker` sidekiq job has about a 12% failure rate
Worker Dashboard: https://dashboards.gitlab.net/d/sidekiq-worker-detail/sidekiq-worker-detail?var-worker=Clusters::Cleanup::ServiceAccountWorker
Kibana Search
Failure Sidekiq Logs:As measured over the past 3 day period on GitLab.com, the Clusters::Cleanup::ServiceAccountWorker
sidekiq job has about a 12% failure rate.
Common errors include:
Kubeclient::HttpError: Timed out connecting to server
Gitlab::UrlBlocker::BlockedUrlError: Host cannot be resolved or invalid
NoMethodError: undefined method kubeclient for nil:NilClass
Proposal
Reduce failure fail to less than 10%.
Possibly validate kubernetes URL before persisting the job?
Verification
Is the SidekiqServiceWorkerExecutionErrorSLOViolation
alert still firing?
Edited by Andrew Newdigate OoO