Concurrent jobs on a single Runner sometimes run in the same CI_PROJECT_DIR
### Summary
When the global setting is `concurrent=4` (probably anything > 1), concurrent jobs on a single Runner _sometimes_ clobber each other by running concurrently in the same build directory (same `CI_PROJECT_DIR`). While the exact nature of the clobbering differs based on timing, it is typically that Job1 in `test` stage is extracting the cache from `build` stage while Job2 in `test` stage is running `git clean` -- which promptly deletes the Job1 cache directory.
This problem does not occur with every build, it is completely timing-related and quite random (not see for days and then see multiple builds in a row).
The job that does the clobbering and the job that gets clobbered can be associated with the same pipeline or a different pipeline. We've seen both.
The real-life example shown in the "logs" section occurred when two pipelines from two different branches were concurrently active. However, the job that was clobbered was clobbered by another job in the same pipeline. In that particular case, it doesn't appear that two concurrently active pipelines were required but it definitely seems the likelihood of the problem is higher when multiple pipelines from multiple different branches are concurrently active.
### Steps to reproduce
The associated `gitlab-ci.yml` is attached to this ticket but here is an overview:[.gitlab-ci.yml](/uploads/6b5c74b6a9e47371de45157c003cbca3/.gitlab-ci.yml)
1. Set the global `concurrent` setting to `4`
1. Enable a single Runner
1. Configure a pipeline with three stages: build, test, deploy
1. Configure keyed caches
1. Make concurrent commits to the repo on multiple (i.e. 2 or 3) different branches. This isn't technically necessary but for some reason the likelihood of the problem occurring is higher when pipelines are concurrently active for multiple different branches.
### What is the current *bug* behavior?
Two jobs occasionally (not always) run concurrently in the same directory (`CI_PROJECT_DIR`) leading to one job interfering with the other, causing the other to fail.
### What is the expected *correct* behavior?
We expect **concurrent** jobs to run in separate directories (separate `CI_PROJECT_DIR`) or, put another way, only one job runs in a given project directory at any one time.
### Relevant logs and/or screenshots
Initial console output for ABC project running `unit_test_debug` job (job 5384 in pipeline 773 for commit 77c48776 from feature/15621_BlahBlah):
```
Running with gitlab-ci-multi-runner 9.2.0 (adfc387)
on KILIMANJARO Test Runner (d450762c)
Using Shell executor...
Running on kilimanjaro...
Fetching changes...
Removing Coverage-feature-15621-blahblah/
HEAD is now at 77c4877 cleaning up RoundRobinTest and unit test.
Checking out 77c48776 as feature/15621_blahblah...
Updating/initializing submodules...
Checking cache for feature-15621-blahblah/DEBUG...
Successfully extracted cache
$ cd $ABC_BUILD_DIR_REL/
$ ctest
Test project /home/gitlab-runner/builds/d450762c/0/abc/abc/Debug-feature-15621-blahblah
Start 1: ABC_IosConfiguration
1/37 Test #1: ABC_IosConfiguration .............. Passed 0.00 sec
```
Initial console output for ABC project running `unit_test_coverage` job (job 5386 in pipeline 773 for commit 77c48776 from feature/15621_BlahBlah):
```
Running with gitlab-ci-multi-runner 9.2.0 (adfc387)
on KILIMANJARO Test Runner (d450762c)
Using Shell executor...
Running on kilimanjaro...
Fetching changes...
Removing Release-feature-15621-blahblah/
HEAD is now at 77c4877 cleaning up RoundRobinTest and unit test.
Checking out 77c48776 as feature/15621_BlahBlah...
Updating/initializing submodules...
Checking cache for feature-15621-blahblah/COVERAGE...
Successfully extracted cache
$ cd $ABC_BUILD_DIR_REL
$ make coverage
CMake Error: Target DependInfo.cmake file not found
Scanning dependencies of target coverage
CMake Error: Directory Information file not found
[100%] Resetting code coverage counters to zero.
Processing code coverage counters and generating report.
Deleting all .da files in . and subdirectories
Done.
Test project /home/gitlab-runner/builds/d450762c/0/abc/abc/Coverage-feature-15621-blahblah
No tests were found!!!
.
<snip other project-specific error messages>
.
Uploading artifacts...
WARNING: ./Coverage-feature-15621-blahblah/Testing/Temporary/LastTest.log: no matching files
Uploading artifacts to coordinator... ok id=5386 responseStatus=201 Created token=jHjxZzxa
ERROR: Job failed: exit status 1
```
Both of the above jobs were started at nearly the same time (don't have exact timestamps). The sequence of events is something like this:
1. Coverage job 5386 starts first, running in directory `/home/gitlab-runner/builds/d450762c/0/abc/abc/`
1. Coverage job 5386 extracts the cache from previous build stage. This creates the directory `/home/gitlab-runner/builds/d450762c/0/abc/abc/Coverage-feature-15621-blahblah`
1. Debug job 5384 starts up sometime _after_ job 5386 extracts its cache but _before_ job 5386 starts running tests
1. Debug job 5384 is also running in directory `/home/gitlab-runner/builds/d450762c/0/abc/abc/` (same directory as job 5386)
1. Debug job 5384 runs standard `GIT_STRATEGY=fetch` which includes `git clean` which leads to job 5384 removing the job 5386 `Coverage-feature-15621-blahblah/` directory
1. Coverage job 5386 resumes and attempts to use the cache files it has just extracted to `Coverage-feature-15621-blahblah/` directory but it finds the directory is now empty so that it fails and fails badly
Please don't suggest changing `GIT_STRATEGY=none` for test stage. Even though that git strategy might be more appropriate for our 'test' and 'deploy' stages -- and it does at least help with the problem -- it doesn't *fix* the problem. It doesn't fix the problem because we also see interference *between* stages, for example, 'build' stage clobbers 'test' stage because we use `GIT_STRATEGY=fetch` for the build stage. We aren't willing to change the build stage git strategy because we want a clean slate when starting a build.
### Output of checks
See below output from our self-hosted gitlab deployment:
#### Results of GitLab environment info
<details>
<summary>Expand for output related to GitLab environment info</summary>
<pre>
root@gitlab:~# gitlab-rake gitlab:env:info
System information
System: Ubuntu 16.04
Proxy: no
Current User: git
Using RVM: no
Ruby Version: 2.3.3p222
Gem Version: 2.6.6
Bundler Version:1.13.7
Rake Version: 10.5.0
Redis Version: 3.2.5
Git Version: 2.11.1
Sidekiq Version:5.0.0
GitLab information
Version: 9.2.2-ee
Revision: b004167
Directory: /opt/gitlab/embedded/service/gitlab-rails
DB Adapter: postgresql
DB Version: 9.6.1
URL: https://gitlab.<company>.com
HTTP Clone URL: https://gitlab.<company>.com/some-group/some-project.git
SSH Clone URL: git@gitlab.<company>.com:some-group/some-project.git
Elasticsearch: no
Geo: no
Using LDAP: no
Using Omniauth: no
GitLab Shell
Version: 5.0.4
Repository storage paths:
- default: /var/opt/gitlab/git-data/repositories
Hooks: /opt/gitlab/embedded/service/gitlab-shell/hooks
Git: /opt/gitlab/embedded/bin/git
</pre>
</details>
#### Results of GitLab application Check
<details>
<summary>Expand for output related to the GitLab application check</summary>
<pre>
root@gitlab:~# gitlab-rake gitlab:check SANITIZE=true
Checking GitLab Shell ...
GitLab Shell version >= 5.0.4 ? ... OK (5.0.4)
Repo base directory exists?
default... yes
Repo storage directories are symlinks?
default... no
Repo paths owned by git:root, or git:git?
default... yes
Repo paths access is drwxrws---?
default... yes
hooks directories in repos are links: ...
8/1 ... ok
10/2 ... ok
6/3 ... ok
9/4 ... ok
9/5 ... ok
9/6 ... ok
9/7 ... ok
9/8 ... ok
Running /opt/gitlab/embedded/service/gitlab-shell/bin/check
Check GitLab API access: OK
Access to /var/opt/gitlab/.ssh/authorized_keys: OK
Send ping to redis server: OK
gitlab-shell self-check successful
Checking GitLab Shell ... Finished
Checking Sidekiq ...
Running? ... yes
Number of Sidekiq processes ... 1
Checking Sidekiq ... Finished
Checking Reply by email ...
Reply by email is disabled in config/gitlab.yml
Checking Reply by email ... Finished
Checking LDAP ...
LDAP is disabled in config/gitlab.yml
Checking LDAP ... Finished
Checking GitLab ...
Git configured with autocrlf=input? ... yes
Database config exists? ... yes
All migrations up? ... yes
Database contains orphaned GroupMembers? ... no
GitLab config exists? ... yes
GitLab config outdated? ... no
Log directory writable? ... yes
Tmp directory writable? ... yes
Uploads directory setup correctly? ... yes
Init script exists? ... skipped (omnibus-gitlab has no init script)
Init script up-to-date? ... skipped (omnibus-gitlab has no init script)
projects have namespace: ...
8/1 ... yes
10/2 ... yes
6/3 ... yes
9/4 ... yes
9/5 ... yes
9/6 ... yes
9/7 ... yes
9/8 ... yes
Redis version >= 2.8.0? ... yes
Ruby version >= 2.1.0 ? ... yes (2.3.3)
Your git bin path is "/opt/gitlab/embedded/bin/git"
Git version >= 2.7.3 ? ... yes (2.11.1)
Active users: 14
Checking GitLab ... Finished
</pre>
</details>
### Possible fixes
Don't allow multiple concurrent jobs to run in the same directory.[.gitlab-ci.yml](/uploads/8b1688d603939e079a933868291a4d59/.gitlab-ci.yml)
issue