Increase maximum job retries to 20

What does this MR do and why?

Increases the maximum job retry count to 20. This is useful for jobs expected to fail more than 3 times. It is also useful for enabling jobs to run on the "spot nodes" of various cloud providers (AWS, Azure). It may be cost effective to run a job on a spot node, but only if the job can retry enough times to get past a failure due to the cloud provider reclaiming the instance.

e.g. AWS spot nodes are 10% the cost of regular nodes. Therefore, a job can be retried 10 times on a spot node before it becomes more cost effective to run it on a regular node.

References

Closes #367916

Screenshots or screen recordings

Before After

How to set up and validate locally

  1. Register and create a gitlab runner
  2. Modify the .gitlab-ci.yml of a repo (GitLab Test in the web IDE will work) to create a job that retries

stages:
  - first
job:
  stage: first
  script:
    - exit 1
  retry:
    max: 20
  1. Commit the changes. A pipeline is created by the commit. It creates a job that is retried 20 times (21 runs). The job is designed to fail every run.
  2. Update the job to retry a maximum of 21 times and commit. The pipeline will fail to create.

MR acceptance checklist

Evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by 🤖 GitLab Bot 🤖

Merge request reports

Loading