GitLab CI jobs retry not working when gitlab-runner-helper image pull fails for kubernetes executor
Everyone can contribute. Help move this issue forward while earning points, leveling up and collecting rewards.
Summary
A 450-seat premium customer reports (Zendesk, internal use) that they see GitLab Runner not retrying jobs which fail during the system set up phase. Here's an example job trace:
Running with gitlab-runner 12.3.0 (a8a019e0)
on c5-gitlab-runner-58dd44f6f-zwrcx ydzm1RBi
Using Kubernetes namespace: gitlab
Using Kubernetes executor with image registry.toyotaconnected.net/tcna-labs/va-lexi/labs-builder:v2 ...
Waiting for pod gitlab/runner-ydzm1rbi-project-92-concurrent-62579tw to be running, status is Pending
Waiting for pod gitlab/runner-ydzm1rbi-project-92-concurrent-62579tw to be running, status is Pending
(...)
Waiting for pod gitlab/runner-ydzm1rbi-project-92-concurrent-62579tw to be running, status is Pending
ERROR: Job failed: image pull failed: Back-off pulling image "gitlab/gitlab-runner-helper:x86_64-latest"
And here's the representative configuration in .gitlab-ci.yml:
retry:
max: 2
when:
- runner_system_failure
- stuck_or_timeout_failure
- unknown_failure
- api_failure
They surmise that this particular failure happens when a new node in Kubernetes is started and the pod was reschedule to the new node.
This looks like it would be a failure mode covered by runner_system_failure but does not appear to be the case.
Steps to reproduce
(How one can reproduce the issue - this is very important)
Example Project
What is the current bug behavior?
Job is not retried when gitlab-runner-helper image pull fails for Kubernetes executor
What is the expected correct behavior?
Job should retry when gitlab-runner-helper image pull fails for Kubernetes executor
Relevant logs and/or screenshots
(Paste any relevant logs - please use code blocks (```) to format console output, logs, and code as it's tough to read otherwise.)
Output of checks
(If you are reporting a bug on GitLab.com, write: This bug happens on GitLab.com)
Results of GitLab environment info
Expand for output related to GitLab environment info
(For installations with omnibus-gitlab package run and paste the output of:
sudo gitlab-rake gitlab:env:info)(For installations from source run and paste the output of:
sudo -u git -H bundle exec rake gitlab:env:info RAILS_ENV=production)
Results of GitLab application Check
Expand for output related to the GitLab application check
(For installations with omnibus-gitlab package run and paste the output of:
sudo gitlab-rake gitlab:check SANITIZE=true)(For installations from source run and paste the output of:
sudo -u git -H bundle exec rake gitlab:check RAILS_ENV=production SANITIZE=true)(we will only investigate if the tests are passing)
Possible fixes
(If you can, link to the line of code that might be responsible for the problem)