Ensure consistent behavior across CI reports
Problem
Today the experience with dealing with CI reports is inconsistent:
-
some reports (like Code Coverage) combines the data from multiple artifacts and store the result into a pipeline artifact. This is the correct way! -
some reports are not persisted as pipeline artifacts. The report is generated all the time (usingReactiveCache
to reduce performance impacts). -
some reports have built-in assumptions that the report must come from only 1 job within the pipeline. This is a wrong assumption because the artifact is a datapoint that contributes to the report. Multiple artifacts (from multiple jobs) can contribute to the report. If a pipeline has 2 SAST jobs running. This wrong assumption will show the report with data from 1 job only instead of combining the data into a single report.- https://gitlab.com/gitlab-org/gitlab/-/blob/a13e153b7aef7e655948457f8a599e5e6658c6cc/ee/app/services/security/report_fetch_service.rb#L21-23
- https://gitlab.com/gitlab-org/gitlab/-/blob/67bae0efde93c103916989355fbfc8e7ecf5efcc/app/models/ci/pipeline.rb#L693
- Because of that
browser_performance
andload_performance
only pick the first artifact of the given type in a pipeline. Any other artifacts of the same type are completely ignored.
- For some report types we do server-side comparison between reports
- For other report types we do client-side comparison between reports
There are also other problems related to the lack of abstractions: other domains leveraging the report feature deal directly with implementation details, making it difficult to refactor code or make changes.
- Artifacts are a storage feature for data. They have a
type
associated to it as "metadata" so we can have dedicated strategies on how to interpret the data. - A build can generate multiple artifacts, but only 1 artifact of a give type per build.
- A pipeline, by consequence, can have multiple artifacts of a given type as it can have multiple builds.
- A report is parsed data combined from multiple artifacts of a given type.
- Because it's data, it could be a persistent pipeline artifact, automatically generated when the pipeline completes.
- The parser must consider that data may come from multiple artifacts across multiple builds.
- The report is produced for a given a pipeline. Trying to generate reports from a single or a collection of builds may produce incomplete data.
Here below is a scenario of a pipeline with multiple artifacts:
flowchart TD
Pipeline --> Job1 --> cqart1[CodeQuality 1/2]
Pipeline --> Job2 --> cqart2[CodeQuality 2/2]
Pipeline --> Job3 --> tr1[TestReport]
Pipeline --> child[Child pipeline]
child --> Job4
child --> Job5 --> ds[CodeCoverage]
- The Code quality report should contain data from
Job1
andJob2
. - The test report contains data from
Job3
. - The Code coverage report, generated in
Job5
should be available as MR report through the parent pipeline - Currently being done.
Among these 3 cases there should be a common abstraction.
Proposal
Introduce abstractions that can ensure a consistent behavior and expectation across all reports. The abstractions should hide implementation details, providing a consistent framework for reports.
class Ci::Reports::Factory
REPORTS = {
code_quality: ::Ci::Reports::CodeQuality,
junit: ::Ci::Reports::Junit,
# ...
}.freeze
COMPARERS = {
code_quality: ::Gitlab::Ci::Reports::CodequalityReportsComparer,
# ...
}
def self.report(type:, pipeline:)
return unless pipeline
return unless pipeline.pipeline_artifacts&.report_exists?(type)
REPORTS.fetch(type).new(pipeline)
end
def self.compare(type:, base_pipeline:, head_pipeline:)
base_report = report(type: type, pipeline: base_pipeline)
head_report = report(type: type, pipeline: head_pipeline)
COMPARERS.fetch(type).new.compare(base_report, head_report)
end
end
class Ci::Reports::BaseReport
def initialize(pipeline)
@pipeline = pipeline
end
def report_type
raise NotImplementedError
end
def parser
raise NotImplementedError
end
def empty?
data.empty?
end
private
def data
@data ||= collect_data || {}
end
def collect_data
return unless pipeline
return unless pipeline.pipeline_artifacts&.report_exists?(report_type)
# read blob from pipeline artifact and parse it.
end
end
class Ci::Reports::CodeQuality < Ci::Reports::BaseReport
def report_type
:code_quality # or use a method/constant from Ci::JobArtifact
end
def parser
::Gitlab::Ci::Parsers::Codequality::CodeClimate
end
def degradations_count
data.count
end
def all_degradations
data.values
end
end
# Use as:
report = Ci::Reports::Factory.fabricate(type: :code_quality, pipeline: pipeline)
# instead of using `pipeline.has_code_quality_reports?`
if report
report.data
else
# none of the builds had any code quality artifacts
end
Reports comparison for merge requests could be extracted out of MergeRequest
model and into a generic comparison engine.
comparison_report = Ci::Reports::Factory.compare(type: :code_quality, head_pipeline: head_pipeline, base_pipeline: base_pipeline)
Always ensure that when a pipeline completes we generate a report from merging all the artifacts:
after_transition any => ::Ci::Pipeline.completed_statuses do |pipeline|
pipeline.run_after_commit do
# For every artifact report type in the pipeline, generate a pipeline artifact
# with merged data.
::Ci::Reports::GenerateReportsWorker.perform_async(pipeline.id)
end
end
class Ci::Reports::GenerateReportsService
def initialize(pipeline)
@pipeline = pipeline
end
def execute
types.each do |type|
report = Ci::Reports::Factory.report(type: type, pipeline: pipeline)
create_pipeline_artifact(report)
end
end
def all_types
# select all unique artifact report types in self and descendants
end
end