Add attempts to Docker executor for container not found
What does this MR do?
Retry stage when container is not found inside of the Docker executor
Why was this MR needed?
When using the Docker executor and one of the stages fail because of No Such Container
error retry that specific stage, up to 2 more times (3
tries in total). This makes the executor a lot resilient to issues
where we are running a stage and the container get removed by some other
system.
The safeBuffer
is necessary to to prevent more data races inside of
our code base, when we run with go test -race
, this is because we are
writing to the job log and reading from the job log to trigger specific
parts of the integration test. The integration test turned out to be
quite big and not a simple one. There is not way we can easily mock the
client from the docker_test
package, since the main goal of this
package is to be an E2E/integration test.
Testing
EXECUTOR_JOB_SECTION_ATTEMPTS
set to 2
Linux/Windows config.toml
[[runners]]
name = "docker"
url = "http://192.168.144.160:3000"
token = "xxxx"
executor = "docker"
[runners.docker]
tls_verify = false
image = "alpine:3.11"
privileged = true
disable_entrypoint_overwrite = false
oom_kill_disable = false
disable_cache = false
volumes = ["/cache"]
shm_size = 0
.gitlab-ci.yml
variables:
SLEEP: 3600
job:
script:
- sleep ${SLEEP}
Steps:
-
Start job with the Runner configured as above and using the
.gitlab-ci.yml
above. -
Wait for job to get to the
sleep
command -
Inside of a terminal window run
docker ps
-
From the
docker ps
output find the build container and do adocker rm -f $CONTAINER_ID
for example:example
$ docker ps $ docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 6e341b6eb96d a187dde48cd2 "sh -c 'if [ -x /usr…" 20 seconds ago Up 20 seconds runner-fl5ihr7-project-19-concurrent-0-build-4 $ docker rm -f 6e341b6eb96d 6e341b6eb96d
-
You should see the build script stage being retried: Linux/Windows
-
If you want you can remove the build containers 1 more time and it should fail the job for example
EXECUTOR_JOB_SECTION_ATTEMPTS
not set (current behavior on master)
Linux/Windows -
Start job with the Runner configured as above and using the
.gitlab-ci.yml
above. -
Wait for job to get to the
sleep
command -
Inside of a terminal window run
docker ps
-
From the
docker ps
output find the build container and do adocker rm -f $CONTAINER_ID
for example:example
$ docker ps $ docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 6e341b6eb96d a187dde48cd2 "sh -c 'if [ -x /usr…" 20 seconds ago Up 20 seconds runner-fl5ihr7-project-19-concurrent-0-build-4 $ docker rm -f 6e341b6eb96d 6e341b6eb96d
Does this MR meet the acceptance criteria?
-
Documentation created/updated -
Added tests for this feature/bug -
In case of conflicts with master
- branch was rebased
What are the relevant issue numbers?
Reference #4450 (closed)