Skip to content

ArchiveTraceWorker fails to remove '<ARTIFACTS_PATH>/tmp/cache/<CACHE_ID>' directory when 'gitlab_rails['artifacts_path']' is on an NFS mount in 14.0

Everyone can contribute. Help move this issue forward while earning points, leveling up and collecting rewards.

Summary

When gitlab_rails['artifacts_path'] is located on an NFS mount, the <ARTIFACTS_PATH>/tmp/cache/<CACHE_ID> directory generated by ArchiveTraceWorker will not be removed.

Here is the chain of events:

Steps to reproduce

  1. Enable object storage for job artifacts
  2. Enable direct upload for artifacts
  3. Use an NFS mount for path specified by gitlab_rails['artifacts_path']
  4. Run a CI job, after the ArchiveTraceWorker finishes successfully the <ARTIFACTS_PATH>/tmp/cache/<CACHE_ID> directory is still present

What is the current bug behavior?

<ARTIFACTS_PATH>/tmp/cache/<CACHE_ID> is still present after ArchiveTraceWorker finishes.

What is the expected correct behavior?

<ARTIFACTS_PATH>/tmp/cache/<CACHE_ID> is removed successfully.

Relevant logs and/or screenshots

No errors logged.

In strace we can see the following:

# job.log opened as read-only as fd 357
2150  19:15:19.218372 open("/gitlab-common/shared/artifacts/tmp/uploads/tmp-trace-27548320200417-1670-1ptmb1q/job.log", O_RDONLY|O_CLOEXEC) = 357</gitlab-common/shared/artifacts/tmp/uploads/tmp-trace-27548320200417-1670-1ptmb1q/job.log> <0.009719>

# job.log is moved to the work directory
2150  19:15:19.664076 rename("/nfs/shared/artifacts/tmp/uploads/tmp-trace-27548320200417-1670-1ptmb1q/job.log", "/nfs/shared/artifacts/tmp/work/1587150919-1670-0024-8708/job.log") = 0 <0.043111>

# job.log is moved to the cache directory
2150  19:15:20.010145 rename("/nfs/shared/artifacts/tmp/work/1587150919-1670-0024-8708/job.log", "/nfs/shared/artifacts/tmp/cache/1587150919-1670-0024-8708/job.log") = 0 <0.041765>

# delete job.log, note that fd 357 has **NOT** been closed at this point
2150  19:15:20.499149 unlink("/nfs/shared/artifacts/tmp/cache/1587150919-1670-0024-8708/job.log" ) = 0 <0.021495>

# attempt to remove cache directory
2150  19:15:20.522118 rmdir("/nfs/shared/artifacts/tmp/cache/1587150919-1670-0024-8708" ) = -1 ENOTEMPTY (Directory not empty) <0.022194>

# fd 357 is closed, the name of the target file has changed as job.log is 'deleted'
2150  19:15:20.560965close(357</nfs/shared/artifacts/tmp/cache/1587150919-1670-0024-8708/.nfsaa3dff08e57ae7ad00000a1e>) = 0 <0.000681>

Output of checks

Results of GitLab environment info

Expand for output related to GitLab environment info

System information
System:         Ubuntu 18.04
Proxy:          no
Current User:   git
Using RVM:      no
Ruby Version:   2.6.5p114
Gem Version:    2.7.10
Bundler Version:1.17.3
Rake Version:   12.3.3
Redis Version:  5.0.7
Git Version:    2.24.2
Sidekiq Version:5.2.7
Go Version:     unknown

GitLab information
Version:        12.9.4-ee
Revision:       6a1a8e88568
Directory:      /opt/gitlab/embedded/service/gitlab-rails
DB Adapter:     PostgreSQL
DB Version:     10.12
Elasticsearch:  no
Geo:            no
Using LDAP:     no
Using Omniauth: yes
Omniauth Providers:

GitLab Shell
Version:        12.0.0
Repository storage paths:
- default:      /var/opt/gitlab/git-data/repositories
GitLab Shell path:              /opt/gitlab/embedded/service/gitlab-shell
Git:            /opt/gitlab/embedded/bin/git


Results of GitLab application Check

Expand for output related to the GitLab application check

Checking GitLab subtasks ...

Checking GitLab Shell ...

GitLab Shell: ... GitLab Shell version >= 12.0.0 ? ... OK (12.0.0) Running /opt/gitlab/embedded/service/gitlab-shell/bin/check Internal API available: OK Redis available via internal API: OK gitlab-shell self-check successful

Checking GitLab Shell ... Finished

Checking Gitaly ...

Gitaly: ... default ... OK

Checking Gitaly ... Finished

Checking Sidekiq ...

Sidekiq: ... Running? ... yes Number of Sidekiq processes ... 1

Checking Sidekiq ... Finished

Checking Incoming Email ...

Incoming Email: ... Reply by email is disabled in config/gitlab.yml

Checking Incoming Email ... Finished

Checking LDAP ...

LDAP: ... LDAP is disabled in config/gitlab.yml

Checking LDAP ... Finished

Checking GitLab App ...

Git configured correctly? ... yes Database config exists? ... yes All migrations up? ... yes Database contains orphaned GroupMembers? ... no GitLab config exists? ... yes GitLab config up to date? ... yes Log directory writable? ... yes Tmp directory writable? ... yes Uploads directory exists? ... yes Uploads directory has correct permissions? ... yes Uploads directory tmp has correct permissions? ... skipped (no tmp uploads folder yet) Init script exists? ... skipped (omnibus-gitlab has no init script) Init script up-to-date? ... skipped (omnibus-gitlab has no init script) Projects have namespace: ... can't check, you have no projects Redis version >= 2.8.0? ... yes Ruby version >= 2.5.3 ? ... yes (2.6.5) Git version >= 2.22.0 ? ... yes (2.24.2) Git user has default SSH configuration? ... yes Active users: ... 1 Is authorized keys file accessible? ... yes Elasticsearch version 5.6 - 6.x? ... skipped (elasticsearch is disabled)

Checking GitLab App ... Finished

Checking GitLab subtasks ... Finished

Possible fixes

CarrierWave's delete_tmp_file_after_storage flag triggers the removal of the cache directory while we are holding the file open. This can be disabled, but would probably require a fair amount of work to replace.

It's not entirely clear to me why we are holding the file open. The previous implementation did not hold the file open, but perhaps this was causing problems.

Edited by 🤖 GitLab Bot 🤖