2026-01-06: Sidekiq queueing SLO violation on urgent-cpu-bound shard (apdex 73.88%)
Sidekiq queueing SLO violation on urgent-cpu-bound shard (apdex 73.88%) (Severity 2)
Problem: Sidekiq queueing performance on the urgent-cpu-bound shard dropped sharply, with the apdex value falling to 73.88%.
Impact: Customers experienced slow merge requests and delayed pipeline triggers. Multiple customer emergencies were reported, and QA tests confirmed recurring timeouts during repository interactions and merge request preparation. The Sidekiq job backlog, which peaked at 600,000–700,000 jobs, is now nearly cleared. All key alerts for web and API services have resolved, and there is no remaining customer impact.
Causes: A required index on the ssh_signatures table was removed during a recent post-deployment migration, causing a surge of slow queries. This overwhelmed the primary database and saturated the connection pool, leading to widespread delays.
Response strategy: We manually recreated the missing database index, which restored performance. Deployments and feature flag changes remain blocked until final checks are completed. Related incidents for API and web service Apdex SLO violations were merged into this incident, and a follow-up was created to add PDM annotations to Grafana for improved monitoring.
This ticket was created to track INC-6458, by incident.io