Expand runner_system_failure and api_failure for CI jobs

Everyone can contribute. Help move this issue forward while earning points, leveling up and collecting rewards.

  • Close this issue

We have this long standing issue gitlab-runner#3816 (closed) which we don't know what's the root cause of it, and retrying it manually does work, however it's not counted as runner_system_failure therefore setting this will not retry it automatically:

retry:
  max: 1 # This is confusing but this means "2 runs at max".
  when:
    - unknown_failure
    - api_failure
    - runner_system_failure

Now we also see https://gitlab.com/gitlab-org/release/framework/issues/106#note_160991719 which we got this when cloning the repository:

error: RPC failed; HTTP 502 curl 22 The requested URL returned error: 502 Bad Gateway
fatal: the remote end hung up unexpectedly

It seems to me this should also be counted as api_failure and retry automatically, but that's not the case.

We also have a list of errors compiled by @rymai https://gitlab.com/gitlab-org/release/framework/issues/106#note_160760213 that we have many kinds of different errors, which the application cannot really do anything about it.

I also cannot seem to find the definitions for api_failure and runner_system_failure. Can we expand them to include the errors we're seeing today? Because we really don't want to retry upon flaky tests, but we can't seem to distinguish them at the moment.

Edited Sep 24, 2025 by 🤖 GitLab Bot 🤖
Assignee Loading
Time tracking Loading