Skip to content

Next

  • Projects
  • Groups
  • Snippets
  • Help
    • Loading...
    • Help
    • Support
    • Submit feedback
    • Contribute to GitLab
  • Sign in / Register
gitlab-runner
gitlab-runner
  • Project overview
    • Project overview
    • Details
    • Activity
    • Releases
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
    • Locked Files
  • Issues 1,542
    • Issues 1,542
    • List
    • Boards
    • Labels
    • Service Desk
    • Milestones
  • Merge Requests 217
    • Merge Requests 217
  • CI / CD
    • CI / CD
    • Pipelines
    • Jobs
    • Schedules
  • Security & Compliance
    • Security & Compliance
    • Dependency List
    • License Compliance
  • Packages
    • Packages
    • Container Registry
  • Analytics
    • Analytics
    • CI / CD
    • Code Review
    • Insights
    • Issues
    • Repository
    • Value Stream
  • Members
    • Members
  • Collapse sidebar
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
  • GitLab.org
  • gitlab-runnergitlab-runner
  • Issues
  • #3515

You need to sign in or sign up before continuing.
Closed
Open
Opened Aug 16, 2018 by Markus Doits@doits
  • Report abuse
  • New issue
Report abuse New issue

Make it configurable when to retry a job (e.g. only on system failure)

Description

Sometime our build fail because of a system runner failure. Even when docker connection is much more stable now (#2408 (closed)), sometimes the setup still fails and we get a "system runner failure".

Bildschirmfoto_2018-08-16_um_11.17.28

It would be nice to have an automatic way to retry such failures, but only those failures.

I know I can set up a general retry, but I explicitly only want system failures to be retried.

The benefit to only retry system failures is: We have a policy that specs should always work reliably. If a spec fails just some of the times, then it is a failure and automatically retrying it until it passes is no solution.

But if it is a system runner failure, in 99.9 % of the time I hit the "retry" button and it works, because it is something temporary like ...

 Job failed (system failure): Error response from daemon: error creating overlay mount to /var/lib/docker/overlay2

I want to remove the noise from our developers so if they see a "job failed" message they can (mostly) be sure it is because of a spec, not because of a system failure.

Proposal

Make it possibly to configure retry. If there are only two different failure type (system and script), one could add an optional when key to retry:

retry:
  count: 2
  when:
    - system_failure
    - script_failure

It should still be allowed to only pass a number to retry, so the old behaviour will stay (retry always). This means the above is similar to:

retry: 2
Edited Aug 16, 2018 by Markus Doits

Linked issues

  • Discussion
  • Designs
Assignee
Assign to
11.5
Milestone
11.5
Assign milestone
Time tracking
None
Due date
None
3
Labels
Community contribution Deliverable devopsverify
Assign labels
  • View project labels
Reference: gitlab-org/gitlab-runner#3515