Sign in or sign up before continuing. Don't have an account yet? Register now to get started.
Register now
2026-02-03: Sidekiq queueing SLI below SLO on low-urgency-cpu-bound shard (apdex 79.43%)
# Sidekiq queueing SLI below SLO on low-urgency-cpu-bound shard (apdex 79.43%) (Severity 2) **Problem**: Sidekiq jobs on the low-urgency-cpu-bound shard were queueing for too long due to database connection saturation, causing slow or blocked CI job processing. **Impact**: Customers experienced CI pipeline delays, high CI job backlogs, and issues retrieving CI variables. At one point, all pipelines on gitlab.com were blocked, preventing any CI job processing and deployments. Over 31 customer tickets reported stuck or delayed pipelines. After deploying fixes, CI jobs, hosted runners, and background processing for environments and review apps have resumed normal operation. **Causes**: A code change to the Sidekiq service model introduced a new 'rescue' code path, which caused repeated or duplicate database updates and increased row locking, leading to database connection saturation. The Deployments::UpdateEnvironmentWorker was a major contributor to database pressure. Earlier, CPU and memory saturation on the Sidekiq shard and a Redis outage also contributed to degraded service. **Response strategy**: We increased CPU and memory limits for the affected Sidekiq shard, paused the Deployments::UpdateEnvironmentWorker, and resolved Redis resource issues. We reverted the problematic code change and deployed an additional fix to reduce database lock contention. We re-enabled the Deployments::UpdateEnvironmentWorker after deploying the fix, and Sidekiq queues are now processing with manageable queue depth. _This ticket was created to track_ [_INC-7049_](https://app.incident.io/gitlab/incidents/7049)_, by_ [_incident.io_](https://app.incident.io) 🔥
issue