Improve aggregation of code coverage across jobs
Everyone can contribute. Help move this issue forward while earning points, leveling up and collecting rewards.
Problem to Solve
GitLab currently calculates pipeline code coverage by simple averaging across all jobs that report coverage. This produces misleading and incorrect coverage percentages in certain scenarios.
At the moment, the pipeline code coverage is an average across all the code coverage that are printed in multiple jobs. https://gitlab.com/gitlab-org/gitlab/blob/master/app/models/ci/pipeline.rb#L669-674
This simple average may not represent the actual code coverage when there are multiple jobs or multiple code bases in the project.
Consider the following scenarios:
Scenario 1: Overlapping Test Coverage
GitLab underreports coverage by 20 percentage points.
A code base has 10 lines, numbered 1-10. The pipeline has 2 jobs:
1. Job 1 has tests covering line 1-7 (70%)
1. Job 2 has tests covering line 1-3 (30%)
Average: 50%
Actual coverage: 70%
Scenario 2: Complementary Test Coverage
GitLab underreports coverage by 35 percentage points.
A code base has 10 lines, numbered 1-10. The pipeline has 2 jobs:
1. Job 1 has tests covering line 1-3 (30%)
1. Job 2 has tests covering line 5-10 (50%)
Average: 40%
Actual coverage: 80%
Scenario 3: Multi-Language Monorepo
Impact: Loses language-specific context; can't track coverage per language.
A code base has 5 lines of ruby and 5 lines of JS, with 2 jobs:
1. `rspec` has tests covering ruby line 1-5 (100% of Ruby)
1. `jest` has tests covering JS line 1-3 (60% of JS)
Average: 80%
Actual coverage: 100% Ruby, 60% JS
Scenario 4: Parallelized Tests
GitLab underreports by 33+ percentage points when tests are parallelized.
For example, if a code base has 5 lines of ruby and 5 lines of JS, with 3 jobs:
1. `rspec A` has tests covering ruby line 1-2 (40% of Ruby)
1. `rspec B` has tests covering ruby line 4-5 (40% of Ruby)
1. `jest` has tests covering JS line 1-3 (60% of JS)
Average: 46.67%
Actual coverage: 80% Ruby, 60% JS.
Business Impact:
- Inaccurate metrics → Poor decision-making about code quality
- Wasted engineering time → Building custom workarounds
- Adoption blocker → Teams can't use GitLab's parallelization features without breaking coverage
Proposal
#367317 (comment 2864287313) - Please help add any suggestions/concerns