conccurent jobs from same project on docker executor runnner sometimes mount the same build volume
Summary
When starting multiple jobs from one or more pipelines from the same repository on the same runner that is allowed to execute jobs simultaneously, sometimes the same host directory is mounted in multiple containers as build folder, which causes the affected jobs to fail, because the concurrent jobs made changes to the same files at the same time.
Steps to reproduce
- install gitlab-runner 17.0.0 on the linux mashine (in our case a ubuntu-wsl)
- create a gitlab runner that runs on a Linux machine as docker executor and allow 8 concurrent jobs
- start 4 (ore more) concurrent jobs. For that we used we used a cicd pipeline as followes
build:linux_clean_4:
stage: build_4W_4L
tags:
- linux
- xxxxx
- docker
image: an_image_with_gcc_cmake_and_git
before_script:
- eval $(ssh-agent -s)
- chmod !$XXXX_KEY"
- ssh-add "$XXXX_KEY"
extends:
- .build_linux_base
- .clean_rules
rules: [ !reference [.linux_rules,rules] , !reference [.clean_rules,rules] ]
parallel:
matrix:
- INSTANCE: 1
- INSTANCE: 2
- INSTANCE: 3
- INSTANCE: 4
That pipeline starts 4 parallel builds of an c++ project.
- run the pipeline
What is the current bug behavior?
As expected, the runner creates a container for each instance. The error does not occur every time, but very often when you start the pipeline. Almost every time for us. I ran a docker ps immediately after starting the pipeline on the host mashine, the result was the following:
xxxx@xxx:~$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
169c4c03ab94 e61a3b9d341b "sh -c 'if [ -x /usr…" 48 seconds ago Up 47 seconds runner-vpnttjiou-project-526-concurrent-1-99e1031df1485bab-build
d9bbadf508e8 e61a3b9d341b "sh -c 'if [ -x /usr…" 49 seconds ago Up 47 seconds runner-vpnttjiou-project-526-concurrent-0-8bc473160bc63531-build
56b2787f723f e61a3b9d341b "sh -c 'if [ -x /usr…" 49 seconds ago Up 47 seconds runner-vpnttjiou-project-526-concurrent-0-92e8c6e77285d255-build
71d11d797b18 e61a3b9d341b "sh -c 'if [ -x /usr…" 50 seconds ago Up 48 seconds runner-vpnttjiou-project-526-concurrent-0-7f5e43ff2fd0125e-build
You can see that it has generated concurrent 0 three times.If you do it several times, sometimes concurrent 1 is also duplicated, There is almost always at least one doubling. Despite different hashes at the end, all three containers with concurrent-0 mount the same volume on the host machine in the build folder, I checked this for two of the containers with docker inspect:
xxxx@xxx:~$ docker inspect -f '{{ json .Mounts }}' d9bbadf508e8 | jq
[
...
{
"Type": "volume",
"Name": "runner-vpnttjiou-project-526-concurrent-0-cache-c33bcaa1fd2c77edfc3893b41966cea8",
"Source": "/var/lib/docker/volumes/runner-vpnttjiou-project-526-concurrent-0-cache-c33bcaa1fd2c77edfc3893b41966cea8/_data",
"Destination": "/builds",
"Driver": "local",
"Mode": "z",
"RW": true,
"Propagation": ""
}
]
xxxx@xxx:~$ docker inspect -f '{{ json .Mounts }}' 56b2787f723f | jq
[
...
{
"Type": "volume",
"Name": "runner-vpnttjiou-project-526-concurrent-0-cache-c33bcaa1fd2c77edfc3893b41966cea8",
"Source": "/var/lib/docker/volumes/runner-vpnttjiou-project-526-concurrent-0-cache-c33bcaa1fd2c77edfc3893b41966cea8/_data",
"Destination": "/builds",
"Driver": "local",
"Mode": "z",
"RW": true,
"Propagation": ""
}
]
It is obvious from the source tag that the same volume is mounted in both containers.
This leads to the build using the same directory in both containers, which almost always leads to the build failing.
What is the expected correct behavior?
Each of the concurrent jobs should have its own index, here concurrent-0, concurrent-1, concurrent-2, concurrent-3 and its own volume for the build mount, so that the competing jobs do not get in each other's way.