Classify "Created fresh repository" step fail and handle retry option for it

Overview

Our setup: We are your premium customer, and we use GitLab Self-hosted runners (Kubernetes executors). Deployed via Helm (with almost default config):

    repoURL: https://charts.gitlab.io
    chart: gitlab-runner
    targetRevision: 0.50.1

From time to time, we receive the git pulling errors at GitLab runner job start.

Error log:

...
Getting source from Git repository
00:40
Fetching changes with git depth set to 10...
Initialized empty Git repository in /builds/emergetech/<replaced_for_issue>/<replaced_for_issue>/.git/
Created fresh repository.
fatal: unable to access 'https://gitlab.com/emergetech/<replaced_for_issue>/<replaced_for_issue>/': The requested URL returned error: 502
...

And job run fails:

...
Cleaning up project directory and file based variables
ERROR: Job failed: command terminated with exit code 1

This job has retries in the .gitlab-ci.yml config:

  retry:
    max: 2
    when:
    - api_failure
    - unknown_failure
    - job_execution_timeout
    - runner_system_failure
    - stuck_or_timeout_failure

But retries don't work here and we receive failed pipeline.

Real example you may find here (I believe GitLab support have access there).

Anyway, just believe me restart don't work here even it looks like a system fail (not the script failure).

Proposal

  • Could you please review if it's possible to detect such cases and recognize them as an unknown_failure or runner_system_failure just to avoid manual restarts when this issue happened?