WIP: POC Coverage Report using ObjectStorage
Ref: #211410 (closed)
What does this MR do?
Following the proposal architecture in !27744 (closed).
In this POC
, I'm exploring how we can improve the ~performance of our coverage report feature in order to remove the feature flag for this feature (coverage_report_view).
This is a similar problem we identified with our JUnit
feature. Parsing a report in memory does not scale.
Cobertura coverage report
This is how our coverage report feature looks like today
With coverage | Without coverage |
---|---|
![]() |
![]() |
Current Architecture
sequenceDiagram
participant MergeRequest
participant Pipeline
participant JobArtifacts
MergeRequest->>Pipeline: has_coverage_reports?
Note right of Pipeline: The data is persisted in the cache leveraging ReactiveCaching
loop find_coverage_reports
Pipeline->>JobArtifacts: Parse in memory a json file generated by simplecov
end
Pipeline->>MergeRequest: Renders report to frontend
Tomorrow Architecture
sequenceDiagram
participant MergeRequest
participant Pipeline
participant PipelineArtifact
MergeRequest->>Pipeline: has_coverage_reports?
Note right of Pipeline: When pipeline is completed
loop PipelineArtifactService
Pipeline->>PipelineArtifact: Persist coverage report with object storage
end
Note right of MergeRequest: No more ReactiveCaching
Pipeline->>MergeRequest: Read file from object storage
Here a simple class diagram from this proposed architecture:
classDiagram
Pipeline --> PipelineArtifact
PipelineArtifact -- PipelineArtifactUploader
PipelineArtifactUploader <|-- JobArtifactArtifactUploader
class Pipeline {
+has_many: pipeline_artifacts
+has_coverage_reports?()
}
class PipelineArtifact{
+belongs_to :project
+belongs_to :pipeline
+created_at: timestamp
+updated_at: timestamp
+integer: file_type
+integer: file_format
+integer: file_store
+integer: size
+text :file
}
Quoting our object storage documentation:
GitLab supports using an object storage service for holding numerous types of data. It’s recommended over NFS and in general it’s better in larger setups as object storage is typically much more performant, reliable, and scalable.
Simplification after this refactor
-
No more inline parsing -
The parsing will happen in the background once a pipeline is completed
Decisions we need to take
-
What data retention policy should we set? -
Do we feel this generic proposal makes sense if we want to persist more reports in the future ( JUnit
)? -
Should we completely get rid of ReactiveCache
with this new architecture?
Why are we doing this?
We discovered early on with this feature that we were hitting some limitations with our current implementation.