Docker conflict, "already in use by container"
## Overview When a runner is running the same job multiple times, sometimes end up having issues with containers names which result in issues like below: ``` ERROR: Preparation failed: Error response from daemon: Conflict. The container name "/runner-465639a0-project-44-concurrent-3-build" is already in use by container "109a38a9e6ef215f4518992c652e04572e5977b5bfbef72d038a0cf1bb662946". You have to remove (or rename) that container to be able to reuse that name. ``` Sometimes even for the service as reported in https://gitlab.com/gitlab-org/gitlab-runner/issues/4327#note_178140765 as well This seems to be the case for both Runners that have `concurrent` set to 1 to 4. ## Things we need to investigate - We need to get more debug info, when this fails, the container with that name in what state is it? Is it running, failed state? - Is it possible that there are multiple GitLab Runner processes talking to the same Docker Daemon ## Proposal ### First Iteration #### Rename containers All services, build & predefined containers should change their naming structure, to include the `CI_JOB_ID` so for example: Assuming [`$CI_JOB_ID`](https://docs.gitlab.com/ee/ci/variables/predefined_variables.html#variables-reference) it `1234` - Build Container: Before `runner-x-SdEBvt-project-20-concurrent-0-build-4` After: `runner-x-SdEBvt-project-20-concurrent-0-build-4-1234` - Service Container: Before `runner-x-SdEBvt-project-20-concurrent-0-postgres-0` After: `runner-x-SdEBvt-project-20-concurrent-0-postgres-0` - Helper image container: `runner-x-SdEBvt-project-20-concurrent-0-predefined` After: `runner-x-SdEBvt-project-20-concurrent-0-predefined-1234` Notice that we did not rename the volume containers on purpose so that we have a persistent volume between runs. #### Remove helper images on context cancellation Investigate if we call [`removeContainer`](https://gitlab.com/gitlab-org/gitlab-runner/blob/a44dee28bf6dafdd40c94698483b57ac54af6e76/executors/docker/executor_docker.go#L1253) on the helper image when the build is canceled/termianted, meaning that the context is cancelled. ### Future Iterations When terminating containers we are sometimes just calling [`removeContainer`](https://gitlab.com/gitlab-org/gitlab-runner/blob/a44dee28bf6dafdd40c94698483b57ac54af6e76/executors/docker/executor_docker.go#L879-888) & [`killContainer`](https://gitlab.com/gitlab-org/gitlab-runner/blob/a44dee28bf6dafdd40c94698483b57ac54af6e76/executors/docker/executor_docker.go#L759-774) but we don't allow for graceful shutdown, that is why first we should Call [`ContainerStop`](https://gitlab.com/gitlab-org/gitlab-runner/blob/a44dee28bf6dafdd40c94698483b57ac54af6e76/vendor%2Fgithub.com%2Fdocker%2Fdocker%2Fclient%2Fcontainer_stop.go#L18-26) instead of `killContaienr` and also call `containerStop` before we remove the container. If we still fail to remove/kill the container we will simply ignore and log it as an error on the Runner logs. More detail in https://gitlab.com/gitlab-org/gitlab-runner/issues/6359 --- <details> <summary> Original Report </summary> Hello ! Since I upgrade to the latest Gitlab CE, my pipelines are failing with those kinds of errors : ``` ERROR: Preparation failed: Error response from daemon: Conflict. The container name "/runner-465639a0-project-44-concurrent-3-build" is already in use by container "109a38a9e6ef215f4518992c652e04572e5977b5bfbef72d038a0cf1bb662946". You have to remove (or rename) that container to be able to reuse that name. ``` It happens when I push to a branch with an already running pipeline. To resolve that, I have to cancel my already running build and relaunch my latest commit pipeline that failed. I am using Gitlab CE 9.4.3 and Gitlab-multi-runner 9.4. Anyone have seen this ? Thank you very much ! </details>
issue