2026-03-09: Sidekiq queueing SLO violation on catchall-b shard with low apdex (#21484) · Issues · GitLab.com / GitLab Infrastructure Team / Production · GitLab

2026-03-09: Sidekiq queueing SLO violation on catchall-b shard with low apdex

# Sidekiq queueing SLO violation on catchall-b shard with low apdex (Severity 2) **Problem**: Sidekiq jobs experienced sustained, high backlogs on catchall-b, elasticsearch, and low-urgency-cpu-bound shards, causing poor job processing performance and SLO violations. **Impact**: Multiple customers experienced delayed or stuck CI/CD pipelines, with many Sidekiq jobs not meeting their expected queueing times on the catchall-b, elasticsearch, and low-urgency-cpu-bound shards. Apdex values were as low as 25.36%, 48.59%, and 0.0009% respectively, resulting in slow pipeline processing for users across several teams. Customer reports of pipeline delays have become infrequent, with the last report about 40 minutes ago. **Causes**: Feature categories like team planning and audit events were recently moved to the catchall-b shard, which has limited capacity. This led to a backlog, since catchall-b can handle only 500 pods. Attempts to add more capacity failed due to a deployment rollout issue. **Response strategy**: We rolled back a recent chart upgrade, which unblocked deployments and allowed configuration changes. We also disabled the Audit Event Streaming Worker on catchall-b to free up resources. These actions improved pipeline processing, reduced system saturation, and brought queue durations and customer reports back to normal levels. _This ticket was created to track_ [_INC-8169_](https://app.incident.io/gitlab/incidents/8169)_, by_ [_incident.io_](https://app.incident.io) 🔥

issue