Retry Jobs Stuck in Pending

Summary

When a failed job retries via the retry keyword available in .gitlab-ci.yml, the retry job gets stuck in a Pending state despite the runner having capacity.

Steps to reproduce

  1. Run a pipeline for, in this case, a specific tagged runner with a value of at least 1 for the retry keyword in .gitlab-ci.yml
  2. Set up the job so that it will fail and cause a replacement retry job to be created
  3. You should see this retry job stuck as Pending

What is the current bug behavior?

If left untouched, the retry jobs almost always remains in Pending for at least 45mins - this can be evidenced afterwards in the job summary under the Queued heading

What is the expected correct behavior?

Retry job should begin straight away if there is runner availability and capacity

Relevant logs and/or screenshots

Evidence of an excessively long queue where the job was Pending image

Results of GitLab environment info

  • Upgraded gitlab-runner on runner instance to 15.10.0 and problem still persisting.
  • Began occurring approx a week ago.
  • It can lead to considerable increases in pipeline runtimes.

Possible fixes

  • Going into the project setting on Gitlab > CI/CD > Runners and then pausing and starting the appropriate runner will see the Pending job begin immediately. This is a manual workaround but not ideal nor always possible
  • Commencing a new pipeline sent to the runner will also cause the Pending job to start immediately

devopsverify
severity3 priority1 pipeline pipeline processing testingcode testing typebug

Edited by Shane Turley