Retry Jobs Stuck in Pending

Summary

When a failed job retries via the retry keyword available in .gitlab-ci.yml, the retry job gets stuck in a Pending state despite the runner having capacity.

Steps to reproduce

Run a pipeline for, in this case, a specific tagged runner with a value of at least 1 for the retry keyword in .gitlab-ci.yml
Set up the job so that it will fail and cause a replacement retry job to be created
You should see this retry job stuck as Pending

What is the current bug behavior?

If left untouched, the retry jobs almost always remains in Pending for at least 45mins - this can be evidenced afterwards in the job summary under the Queued heading

What is the expected correct behavior?

Retry job should begin straight away if there is runner availability and capacity

Relevant logs and/or screenshots

Evidence of an excessively long queue where the job was Pending

Results of GitLab environment info

Upgraded gitlab-runner on runner instance to 15.10.0 and problem still persisting.
Began occurring approx a week ago.
It can lead to considerable increases in pipeline runtimes.

Possible fixes

Going into the project setting on Gitlab > CI/CD > Runners and then pausing and starting the appropriate runner will see the Pending job begin immediately. This is a manual workaround but not ideal nor always possible
Commencing a new pipeline sent to the runner will also cause the Pending job to start immediately

devopsverify
severity3 priority1 pipeline pipeline processing testingcode testing typebug

Edited Mar 24, 2023 by Shane Turley