Skip to content

coverage_report.json of open MRs inaccessible after running cleanup:orphan_job_artifact_files

Summary

A large GitLab Premium customer reported internally that running gitlab-rake gitlab:cleanup:orphan_job_artifact_files causes test coverage indicators to disappear too early.

Steps to reproduce

  1. On a test Omnibus instance, import a project with uses Cobertura for test coverage visualisation (example)
  2. Create an MR with new method, but no test and let the CI job complete => Correct line highlighting in the diff.
  3. Run gitlab-rake gitlab:cleanup:orphan_job_artifact_files
  4. Reload the MR diff.

Example Project

See step 0.

What is the current bug behavior?

Rake task reports:

Found orphan job artifact file @ /var/opt/gitlab/gitlab-rails/shared/artifacts/…/pipelines/…pipeline_id…/artifacts

despite the MR still being open and the artifact file being not outdated. Also, the coverage_report.json is no longer accessible (error 400 in the browser dev console) upon reloding the MR diff page.

What is the expected correct behavior?

The Rake task doesn't classify the most recent coverage report as an orphan.

Relevant logs and/or screenshots

See possible fixes section below.

Output of checks

Reproduced on v14.4.2 so far. Not yet tested on a newer version, but the code hasn't changed.

Results of GitLab environment info

Expand for output related to GitLab environment info

(For installations with omnibus-gitlab package run and paste the output of:
`sudo gitlab-rake gitlab:env:info`)

(For installations from source run and paste the output of:
`sudo -u git -H bundle exec rake gitlab:env:info RAILS_ENV=production`)

Results of GitLab application Check

Expand for output related to the GitLab application check

(For installations with omnibus-gitlab package run and paste the output of: sudo gitlab-rake gitlab:check SANITIZE=true)

(For installations from source run and paste the output of: sudo -u git -H bundle exec rake gitlab:check RAILS_ENV=production SANITIZE=true)

(we will only investigate if the tests are passing)

Possible fixes

My investigation so far has resulted in the understanding that:

  1. an instance of the class ArtifactFile has a path structure like: /var/opt/gitlab/gitlab-rails/shared/artifacts/ab/cd/abcd…/YYYY_MM_DD/…job_id…/…ci_job_artifacts.id….
  2. in the clean_one! method the artifact_file.path has a different structure:
    …/abcd…/pipelines/…pipeline_id…/artifacts

My preliminary understanding without having studied the lost_and_found method yet, is that different IDs are being used to reject those to keep. Or, the path structure got changed at some point, but the ID extraction code wasn't changed with it?

No, that's not it exactly, but the new report artifacts are being included (not rejected, as they should, logically). Changing the code in def lost_and_found around artifact_files.reject showed:

  pp existing_artifact_ids.inspect
  #=> "[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35]"

  orphans = artifact_files.reject { |artifact| existing_artifact_ids.include?(artifact.artifact_id) }

  pp orphans.inspect
  #=> "[#<Gitlab::Cleanup::OrphanJobArtifactFilesBatch::ArtifactFile:0x… @path=\"/var/opt/gitlab/gitlab-rails/shared/artifacts/d4/73/d473…/pipelines/4/artifacts\">]"

  orphans

After a gitlab-rake gitlab:cleanup:orphan_job_artifact_files DRY_RUN=false and another MR update, the artifact folder looks like this:

root@ … /d473…/pipelines# tree

├── 4  # emptied in previous cleanup:orphan_job_artifact_files
└── 5
    └── artifacts
        └── 4
            └── code_coverage.json

The 5/artifacts get removed by next cleanup:orphan_job_artifact_files. So, either the reject doesn't work as intended, or ::Ci::JobArtifact.id_in(artifact_file_ids).pluck_primary_key in the line above it doesn't?

Edited by Katrin Leinweber