coverage_report.json of open MRs inaccessible after running cleanup:orphan_job_artifact_files
Summary
A large GitLab Premium customer reported internally that running gitlab-rake gitlab:cleanup:orphan_job_artifact_files causes test coverage indicators to disappear too early.
Steps to reproduce
- On a test Omnibus instance, import a project with uses Cobertura for test coverage visualisation (example)
- Create an MR with new method, but no test and let the CI job complete => Correct line highlighting in the diff.
- Run
gitlab-rake gitlab:cleanup:orphan_job_artifact_files - Reload the MR diff.
Example Project
See step 0.
What is the current bug behavior?
Rake task reports:
Found orphan job artifact file @ /var/opt/gitlab/gitlab-rails/shared/artifacts/…/pipelines/…pipeline_id…/artifacts
despite the MR still being open and the artifact file being not outdated. Also, the coverage_report.json is no longer accessible (error 400 in the browser dev console) upon reloding the MR diff page.
What is the expected correct behavior?
The Rake task doesn't classify the most recent coverage report as an orphan.
Relevant logs and/or screenshots
See possible fixes section below.
Output of checks
Reproduced on v14.4.2 so far. Not yet tested on a newer version, but the code hasn't changed.
Results of GitLab environment info
Expand for output related to GitLab environment info
(For installations with omnibus-gitlab package run and paste the output of: `sudo gitlab-rake gitlab:env:info`) (For installations from source run and paste the output of: `sudo -u git -H bundle exec rake gitlab:env:info RAILS_ENV=production`)
Results of GitLab application Check
Expand for output related to the GitLab application check
(For installations with omnibus-gitlab package run and paste the output of:
sudo gitlab-rake gitlab:check SANITIZE=true)(For installations from source run and paste the output of:
sudo -u git -H bundle exec rake gitlab:check RAILS_ENV=production SANITIZE=true)(we will only investigate if the tests are passing)
Possible fixes
My investigation so far has resulted in the understanding that:
- an instance of the class
ArtifactFilehas apathstructure like:/var/opt/gitlab/gitlab-rails/shared/artifacts/ab/cd/abcd…/YYYY_MM_DD/…job_id…/…ci_job_artifacts.id…. - in the
clean_one!method theartifact_file.pathhas a different structure:
…/abcd…/pipelines/…pipeline_id…/artifacts
My preliminary understanding without having studied the lost_and_found method yet, is that different IDs are being used to reject those to keep. Or, the path structure got changed at some point, but the ID extraction code wasn't changed with it?
No, that's not it exactly, but the new report artifacts are being included (not rejected, as they should, logically). Changing the code in def lost_and_found around artifact_files.reject showed:
pp existing_artifact_ids.inspect
#=> "[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35]"
orphans = artifact_files.reject { |artifact| existing_artifact_ids.include?(artifact.artifact_id) }
pp orphans.inspect
#=> "[#<Gitlab::Cleanup::OrphanJobArtifactFilesBatch::ArtifactFile:0x… @path=\"/var/opt/gitlab/gitlab-rails/shared/artifacts/d4/73/d473…/pipelines/4/artifacts\">]"
orphans
After a gitlab-rake gitlab:cleanup:orphan_job_artifact_files DRY_RUN=false and another MR update, the artifact folder looks like this:
root@ … /d473…/pipelines# tree
…
├── 4 # emptied in previous cleanup:orphan_job_artifact_files
└── 5
└── artifacts
└── 4
└── code_coverage.json
The 5/artifacts get removed by next cleanup:orphan_job_artifact_files. So, either the reject doesn't work as intended, or ::Ci::JobArtifact.id_in(artifact_file_ids).pluck_primary_key in the line above it doesn't?