Skip to content

Allow to configure when to retry builds

Markus Doits requested to merge doits/gitlab-ce:max_retries_when into master

What does this MR do?

Sometimes our build fails because of a system runner failure. Even when docker connection is much more stable now (#2408 (closed)), sometimes the setup still fails and we get a "system runner failure".

It would be nice to have an automatic way to retry such failures, but only those failures.

The benefit to only retry system failures is: We have a policy that specs should always work reliably. If a spec fails just some of the times, then it is a failure and automatically retrying it until it passes is no solution.

But if it is a system runner failure, in 99.9 % of the time I hit the "retry" button and it works, because it is something temporary like ...

 Job failed (system failure): Error response from daemon: error creating overlay mount to /var/lib/docker/overlay2

I want to remove the noise from our developers so if they see a "job failed" message they can (mostly) be sure it is because of a spec, not because of a system failure.

By making it possible to define retry as a hash, max is the new way to define the maximum number of retries:

retry: 2

# becomes

retry:
  max: 2

retry: 2 still works to stay backwards compatible.

A new key when can define when to retry:

retry:
  max: 2
  when: runner_system_failure
retry:
  max: 2
  when:
    - runner_system_failure
    - api_failure

Possible whens:

  • always: same as before and default, retry in every case
  • an array of failure_reasons when to retry: only retry in case of one of those failure reason

failure_reaons are the keys of CI::Build.failure_reasons, currently:

  • unknown_failure
  • script_failure
  • api_failure
  • stuck_or_timeout_failure
  • runner_system_failure
  • missing_dependency_failure
  • runner_unsupported

Examples

Retry always, maximum 5 times

retry: 5

# or

retry:
  max: 5

# or

retry:
  max: 5
  when: always

Retry once in case of api failure or system runner failure, but not on any other failure

retry:
  max: 1
  when:
    - api_failure
    - runner_system_failure

Only retry in case of runner system failure

retry:
  max: 1
  when:
    - runner_system_failure

# or

retry:
  max: 1
  when: runner_system_failure

Current State & Questions (last updated 12.10.2018 16:45 CEST / 14:45 GMT)

State

Tested in production at my own GitLab 11.3.4 instance, feature seems to work correctly as it should.

Questions

What are the relevant issue numbers?

Does this MR meet the acceptance criteria?

Edited by Grzegorz Bizon

Merge request reports