Skip to content

Consider all docker pull image system error as runner script failure

Romuald Atchadé requested to merge docker-pull-image-error-management into main

What does this MR do?

This MR considers all the docker pull image system error as runner script failure

Why was this MR needed?

In the previous iterations (!2995 (merged) and !3060 (merged)), we have tried to improve the way the docker pull image failure were managed by gitlab-runner in order to minimize the impact those failure have on the SLO.

Without success

With this MR we start considering all the SystemError as runner-script-failure until a better way to manage them is found.

What's the best way to test this MR?

config.toml
concurrent = 1

[[runners]]
  url = "https://gitlab.com/"
  token = "__REDACTED__"
  executor = "docker"
  [runners.docker]
    image = "gcr.io/trellisconnect/actorbase:latest"
    #image = "164073796161.dkr.ecr.ap-northeast-1.amazonaws.com/ianwalter/pnpm:v1.4.0"
gitlab-ci.yml
variables:
 DURATION: 25

job:
 script:
 - 'for i in $(seq 1 $DURATION); do echo $(date); sleep 1; done'
 - echo "done"

Run a job using the config.toml and the gitlab-ci.yml provided. It will fail with a script-failure error as seen here 👉🏿 link

Note : In the TOML file, the job can be re ran using the commented image

What are the relevant issue numbers?

close: #28075 (closed)

Edited by Romuald Atchadé

Merge request reports