Error raised in /jobs/request caused job to be stuck in running state (!193678) · Merge requests · GitLab.org / GitLab

What does this MR do and why?

This MR fixes an infradev issue coming from an incident where errors during metrics tracking in the CI job registration process were causing jobs to get stuck in a pending state. The problem occurred when exceptions were raised during metrics collection in /jobs/request, preventing the service from properly returning job results to runners.

Key changes:

Wraps metrics tracking calls in exception handling to prevent failures from blocking job assignment
Extracts metrics tracking logic into separate methods (track_success and track_conflict) with proper error handling
Uses track_and_raise_for_dev_exception to ensure we're aware of metrics issues in development while not affecting production job processing
Adds comprehensive test coverage for metrics error scenarios

Why this matters: When metrics tracking failed, the entire job registration process would fail, leaving jobs in a pending state indefinitely. This fix ensures that metrics failures don't prevent runners from receiving job assignments, maintaining CI/CD pipeline reliability.

References

Closes #348673 (closed)

How to set up and validate locally

Set up a GitLab development environment with runners configured
Create a test pipeline with jobs

To simulate the error condition, you can monkey-patch the metrics service:

# In rails console
allow_any_instance_of(::Gitlab::Ci::Queue::Metrics).to receive(:register_success)
  .and_raise(StandardError, 'metrics failure')

Edited Jun 10, 2025 by Allison Browne

Error raised in /jobs/request caused job to be stuck in running state

What does this MR do and why?

References

How to set up and validate locally

Merge request reports