2026-03-12: Reports of CI Delays in picking up jobs and finishing jobs
# Reports of CI Delays in picking up jobs and finishing jobs (Severity 3)
**Problem**: CI jobs on the SaaS Linux small runner shard were delayed in being picked up and finished due to backlog and runner saturation, with additional visibility issues affecting accurate backlog reporting.
**Impact**: Customers using the SaaS Linux small runner shard experienced delays of several minutes starting and finishing CI jobs. Both shared and some self-hosted runners were affected while Sidekiq was impaired, with at least 10 customer tickets reported. The single alert triggered by this issue has been resolved. Visibility into the true backlog was also limited for all affected runners due to a capped queue size metric.
**Causes**: A CI database performance issue led to slow or stalled queries to CI replicas, causing Sidekiq to accumulate a backlog of CI jobs. When the database recovered, Sidekiq released a surge of jobs, briefly overwhelming runner capacity on the Linux small shard. This created high job volume, runner saturation, and job queue delays.
**Response strategy**: We resolved the CI database issue, allowing Sidekiq to process the backlog and release queued jobs. Runner metrics and job start times are now returning to normal. We have opened a follow-up to fix the pending job queue size metric, which did not accurately reflect the real backlog.
_This ticket was created to track_ [_INC-8376_](https://app.incident.io/gitlab/incidents/8376)_, by_ [_incident.io_](https://app.incident.io) 🔥
issue