GitLab CI jobs retry not working when gitlab-runner-helper image pull fails for kubernetes executor

Everyone can contribute. Help move this issue forward while earning points, leveling up and collecting rewards.

  • Close this issue

Summary

A 450-seat premium customer reports (Zendesk, internal use) that they see GitLab Runner not retrying jobs which fail during the system set up phase. Here's an example job trace:

Running with gitlab-runner 12.3.0 (a8a019e0)
on c5-gitlab-runner-58dd44f6f-zwrcx ydzm1RBi
Using Kubernetes namespace: gitlab
Using Kubernetes executor with image registry.toyotaconnected.net/tcna-labs/va-lexi/labs-builder:v2 ...
Waiting for pod gitlab/runner-ydzm1rbi-project-92-concurrent-62579tw to be running, status is Pending
Waiting for pod gitlab/runner-ydzm1rbi-project-92-concurrent-62579tw to be running, status is Pending

(...)

Waiting for pod gitlab/runner-ydzm1rbi-project-92-concurrent-62579tw to be running, status is Pending
ERROR: Job failed: image pull failed: Back-off pulling image "gitlab/gitlab-runner-helper:x86_64-latest"

And here's the representative configuration in .gitlab-ci.yml:

  retry:
    max: 2
    when:
      - runner_system_failure
      - stuck_or_timeout_failure
      - unknown_failure
      - api_failure

They surmise that this particular failure happens when a new node in Kubernetes is started and the pod was reschedule to the new node.

This looks like it would be a failure mode covered by runner_system_failure but does not appear to be the case.

Steps to reproduce

(How one can reproduce the issue - this is very important)

Example Project

What is the current bug behavior?

Job is not retried when gitlab-runner-helper image pull fails for Kubernetes executor

What is the expected correct behavior?

Job should retry when gitlab-runner-helper image pull fails for Kubernetes executor

Relevant logs and/or screenshots

(Paste any relevant logs - please use code blocks (```) to format console output, logs, and code as it's tough to read otherwise.)

Output of checks

(If you are reporting a bug on GitLab.com, write: This bug happens on GitLab.com)

Results of GitLab environment info

Expand for output related to GitLab environment info

(For installations with omnibus-gitlab package run and paste the output of: sudo gitlab-rake gitlab:env:info)

(For installations from source run and paste the output of: sudo -u git -H bundle exec rake gitlab:env:info RAILS_ENV=production)

Results of GitLab application Check

Expand for output related to the GitLab application check

(For installations with omnibus-gitlab package run and paste the output of: sudo gitlab-rake gitlab:check SANITIZE=true)

(For installations from source run and paste the output of: sudo -u git -H bundle exec rake gitlab:check RAILS_ENV=production SANITIZE=true)

(we will only investigate if the tests are passing)

Possible fixes

(If you can, link to the line of code that might be responsible for the problem)

Edited Sep 29, 2025 by 🤖 GitLab Bot 🤖
Assignee Loading
Time tracking Loading