coverage_report.json of open MRs inaccessible after running cleanup:orphan_job_artifact_files
Summary
A large GitLab Premium customer reported internally that running gitlab-rake gitlab:cleanup:orphan_job_artifact_files
causes test coverage indicators to disappear too early.
Steps to reproduce
- On a test Omnibus instance, import a project with uses Cobertura for test coverage visualisation (example)
- Create an MR with new method, but no test and let the CI job complete => Correct line highlighting in the diff.
- Run
gitlab-rake gitlab:cleanup:orphan_job_artifact_files
- Reload the MR diff.
Example Project
See step 0.
What is the current bug behavior?
Rake task reports:
Found orphan job artifact file @ /var/opt/gitlab/gitlab-rails/shared/artifacts/…/pipelines/…pipeline_id…/artifacts
despite the MR still being open and the artifact file being not outdated. Also, the coverage_report.json
is no longer accessible (error 400 in the browser dev console) upon reloding the MR diff page.
What is the expected correct behavior?
The Rake task doesn't classify the most recent coverage report as an orphan.
Relevant logs and/or screenshots
See possible fixes section below.
Output of checks
Reproduced on v14.4.2 so far. Not yet tested on a newer version, but the code hasn't changed.
Results of GitLab environment info
Expand for output related to GitLab environment info
(For installations with omnibus-gitlab package run and paste the output of: `sudo gitlab-rake gitlab:env:info`) (For installations from source run and paste the output of: `sudo -u git -H bundle exec rake gitlab:env:info RAILS_ENV=production`)
Results of GitLab application Check
Expand for output related to the GitLab application check
(For installations with omnibus-gitlab package run and paste the output of:
sudo gitlab-rake gitlab:check SANITIZE=true
)(For installations from source run and paste the output of:
sudo -u git -H bundle exec rake gitlab:check RAILS_ENV=production SANITIZE=true
)(we will only investigate if the tests are passing)
Possible fixes
My investigation so far has resulted in the understanding that:
- an instance of the class
ArtifactFile
has apath
structure like:/var/opt/gitlab/gitlab-rails/shared/artifacts/ab/cd/abcd…/YYYY_MM_DD/…job_id…/…ci_job_artifacts.id…
. - in the
clean_one!
method theartifact_file.path
has a different structure:
…/abcd…/pipelines/…pipeline_id…/artifacts
My preliminary understanding without having studied the lost_and_found
method yet, is that different IDs are being used to reject
those to keep. Or, the path structure got changed at some point, but the ID extraction code wasn't changed with it?
No, that's not it exactly, but the new report artifacts are being included (not rejected, as they should, logically). Changing the code in def lost_and_found
around artifact_files.reject
showed:
pp existing_artifact_ids.inspect
#=> "[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35]"
orphans = artifact_files.reject { |artifact| existing_artifact_ids.include?(artifact.artifact_id) }
pp orphans.inspect
#=> "[#<Gitlab::Cleanup::OrphanJobArtifactFilesBatch::ArtifactFile:0x… @path=\"/var/opt/gitlab/gitlab-rails/shared/artifacts/d4/73/d473…/pipelines/4/artifacts\">]"
orphans
After a gitlab-rake gitlab:cleanup:orphan_job_artifact_files DRY_RUN=false
and another MR update, the artifact folder looks like this:
root@ … /d473…/pipelines# tree
…
├── 4 # emptied in previous cleanup:orphan_job_artifact_files
└── 5
└── artifacts
└── 4
└── code_coverage.json
The 5/artifacts
get removed by next cleanup:orphan_job_artifact_files. So, either the reject
doesn't work as intended, or ::Ci::JobArtifact.id_in(artifact_file_ids).pluck_primary_key
in the line above it doesn't?