Only ingest component properties from reachability SBOM reports
Problem to solve
In GitLab 17.5 we've released dependency static reachability as an experiment for Java and Python: https://about.gitlab.com/releases/2024/10/17/gitlab-17-5-released/#static-reachability-for-java-and-python
Related epic: Implement Static Reachability for Java and Pyth... (&14177 - closed)
The implementation of this features is based on the generation of an enriched SBOM report by a CI job that merges the results from SCA job (generating the SBOM report) and the GitLab Advanced SAST jobs (generating a custom reachability artifact). Though, the CI job that enriches the SBOM report generated by the SCA job is actually creating a new SBOM report as it declares a new CDX report artifact. Indeed, even if the artifacts have the same name, there will be one artifact uploaded per CI job. The backend logic that collects artifacts will loop through all artifacts of the SBOM type for the pipeline when it ingests components.
This means the ingestion logic will actually process the SBOM report content twice:
- once for the artifact from the SCA job, without reachability data filled
- once for the artifact from the static reachability "merge results" job, with the reachability data filled.
Requirements
- Only scan a unique component for vulnerabilities once
- Only try to create an SBOM occurrence for each unique component once
- Only update the reachability property for components with it
Affected components
The following components will loop through duplicates if we don't filter out one of the SBOM sets.
Dependencies::ExportSerializers::SbomGitLab::LicenseScanning::PipelineComponentsSbom::CreateVulnerabilitiesServiceSbom::Ingestion::IngestReportsServiceSecurity::StoreScansService
There is a concern though about the useless resource usage and potential additional traffic on the SBOM related tables which are already suffering from too many updates and recently went through optimizations: Reduce the number of tuples updated for tables ... (&13616 - closed). We should avoid adding additional pressure on the database whenever possible.
Implementation plan
See #500746 (comment 2366288202) for proposal discussion.
- Update the enrichment analyzer so that it updates the analyzer metadata to reflect
sca-to-sarif-matcher. - Create a concern for SBOM processors -
SbomProcessor.- Move the
valid_sbom_reportslogic used bySbom::CreateVulnerabilitiesServiceandSbom::Ingestion::IngestReportsServiceto this concern. Have it skip if it matches the analyzer metadata for the enrichment analyzer.
- Move the
- Update Security::StoreScansService#sbom_report_artifacts so that it skips the reports with the enrichment analyzer metadata.
- Update Sbom::Ingestion::IngestReportsService#valid_sbom_reports so that it uses the new concern.
- Update Sbom::CreateVulnerabilitiesService#valid_sbom_reports so that it uses the new concern.
- Update Dependencies::ExportSerializers::Sbom#merged_report to use the new concern.
- Update GitLab::LicenseScanning::PipelineComponents#fetch to use the new concern.
- Create a new event to signal that SBOM ingestion and scanning has completed.
- Create a new worker that updates the reachability status from SBOMs produced by the enrichment analyzer. It should subscribe to the new event.
- Set a concurrency limit on this worker to avoid consuming all resources.
Testing
Validate the following functionality works as expected with the reachability feature enabled.
Merge request
- License scanning works as expected.
- Security tab works as expected.
- The SBOM export works as expected
Default branch
- Dependency list displays dependencies correctly.
- Vulnerability report shows correct vulnerabilities for CycloneDX SBOM.
- Licenses for project are correct.