Move DailyBuildGroupReportResultService execution to BuildCoverageWorker
Everyone can contribute. Help move this issue forward while earning points, leveling up and collecting rewards.
Proposal
By moving the process of writing DailyBuildGroupReportResult data to build-finished rather than pipeline-finished, where it currently resides, we can show more up-to-date coverage data with significantly lower overhead.
More up-to-date
We recently modified the coverage badge to search pipeline of several statuses for successful builds to report coverage.
The specific use-case reporting the problem was a customer with pipelines that we often very long running (days) because of blocking manual jobs that were configured to run after the coverage was already known. Though searching all builds proved unscalable for gitlab.com, we now check the latest of failed, running, and success for successful coverage builds.
More Performant
The current solution adds additional overhead to every request, but not unacceptably so. However we do have a separate model, DailyBuildGroupReportResult, specifically for performant coverage data reporting, and I believe updating the current ReportResult right after the build finished would let us avoid querying ci_builds or ci_pipelines at all.
The immediate use of this would be to improve the performance of the coverage badge endpoint, but if the data is current enough, we could probably stop using the builds table for coverage data across most, if not all, of GitLab.
Suggested Implementation
Execution hook
Remove the after_transition event from Pipeline:
module Ci
class Pipeline < ApplicationRecord
...
- after_transition [:created, :waiting_for_resource, :preparing, :pending, :running] => :success do |pipeline|
- # We wait a little bit to ensure that all BuildFinishedWorkers finish first
- # because this is where some metrics like code coverage is parsed and stored
- # in CI build records which the daily build metrics worker relies on.
- pipeline.run_after_commit { Ci::DailyBuildGroupReportResultsWorker.perform_in(10.minutes, pipeline.id) }
- end
and add the Service execution directly to BuildCoverageWorker:
class BuildCoverageWorker # rubocop:disable Scalability/IdempotentWorker
..
def perform(build_id)
- Ci::Build.find_by(id: build_id)&.update_coverage
+ return unless build = Ci::Build.find_by(id: build_id)
+
+ build.update_coverage
+ Ci::DailyBuildGroupReportResultService.new.execute(build.pipeline)
end
end
And refactor DailyBuildGroupReportResultService to execute when the whole build group (will often be just one build) has finished successfully.
module Ci
class DailyBuildGroupReportResultService
def execute(pipeline)
DailyBuildGroupReportResult.upsert_reports(coverage_reports(pipeline))
end
private
def coverage_reports(pipeline)
base_attrs = {
project_id: pipeline.project_id,
ref_path: pipeline.source_ref_path,
date: pipeline.created_at.to_date,
last_pipeline_id: pipeline.id,
default_branch: pipeline.default_branch?
}
aggregate(pipeline.builds.with_coverage).map do |group_name, group|
base_attrs.merge(
group_name: group_name,
data: {
'coverage' => average_coverage(group)
}
)
end
end
def aggregate(builds)
- builds.group_by(&:group_name)
+ builds.group_by(&:group_name).select do |group_name, builds|
+ builds.all?(&:success)
+ end
end
def average_coverage(group)
total_coverage = group.reduce(0.0) { |sum, build| sum + build.coverage }
(total_coverage / group.size).round(2)
end
end
end
This diff is more of a proof of concept, and could probably be made more efficient if and when we actually implement it. But the narrow changes here should clearly demonstrate the logical difference.