WIP: POC Coverage Report using ObjectStorage (!36760) · Merge requests · GitLab.org / GitLab

Max Orefice requested to merge mo-coverage-report-poc into master Jul 13, 2020

What does this MR do?

Following the proposal architecture in !27744 (closed).

In this POC, I'm exploring how we can improve the ~performance of our coverage report feature in order to remove the feature flag for this feature (coverage_report_view).

This is a similar problem we identified with our JUnit feature. Parsing a report in memory does not scale.

Cobertura coverage report

This is how our coverage report feature looks like today 👇

With coverage	Without coverage

Current Architecture

sequenceDiagram
    participant MergeRequest
    participant Pipeline
    participant JobArtifacts
    MergeRequest->>Pipeline: has_coverage_reports?
    Note right of Pipeline: The data is persisted in the cache leveraging ReactiveCaching
    loop find_coverage_reports
        Pipeline->>JobArtifacts: Parse in memory a json file generated by simplecov
    end
    Pipeline->>MergeRequest: Renders report to frontend

Tomorrow Architecture

sequenceDiagram
    participant MergeRequest
    participant Pipeline
    participant PipelineArtifact
    MergeRequest->>Pipeline: has_coverage_reports?
    Note right of Pipeline: When pipeline is completed
    loop PipelineArtifactService
        Pipeline->>PipelineArtifact: Persist coverage report with object storage
    end
    Note right of MergeRequest: No more ReactiveCaching
    Pipeline->>MergeRequest: Read file from object storage

Here a simple class diagram from this proposed architecture:

classDiagram
	Pipeline --> PipelineArtifact
  PipelineArtifact -- PipelineArtifactUploader
  PipelineArtifactUploader <|-- JobArtifactArtifactUploader
	class Pipeline {
		+has_many: pipeline_artifacts
    +has_coverage_reports?()
	}
	class PipelineArtifact{
    +belongs_to :project
		+belongs_to :pipeline
    +created_at: timestamp
    +updated_at: timestamp
    +integer: file_type
    +integer: file_format
    +integer: file_store
    +integer: size
    +text :file
  }

Quoting our object storage documentation:

GitLab supports using an object storage service for holding numerous types of data. It’s recommended over NFS and in general it’s better in larger setups as object storage is typically much more performant, reliable, and scalable.

Simplification after this refactor

No more inline parsing
The parsing will happen in the background once a pipeline is completed

Decisions we need to take

What data retention policy should we set?
Do we feel this generic proposal makes sense if we want to persist more reports in the future (JUnit)?
Should we completely get rid of ReactiveCache with this new architecture?

Why are we doing this?

We discovered early on with this feature that we were hitting some limitations with our current implementation.

Edited Jul 23, 2020 by Max Orefice

WIP: POC Coverage Report using ObjectStorage