Only ingest component properties from reachability SBOM reports

Problem to solve

In GitLab 17.5 we've released dependency static reachability as an experiment for Java and Python: https://about.gitlab.com/releases/2024/10/17/gitlab-17-5-released/#static-reachability-for-java-and-python

Related epic: Implement Static Reachability for Java and Pyth... (&14177 - closed)

The implementation of this features is based on the generation of an enriched SBOM report by a CI job that merges the results from SCA job (generating the SBOM report) and the GitLab Advanced SAST jobs (generating a custom reachability artifact). Though, the CI job that enriches the SBOM report generated by the SCA job is actually creating a new SBOM report as it declares a new CDX report artifact. Indeed, even if the artifacts have the same name, there will be one artifact uploaded per CI job. The backend logic that collects artifacts will loop through all artifacts of the SBOM type for the pipeline when it ingests components.

This means the ingestion logic will actually process the SBOM report content twice:

  • once for the artifact from the SCA job, without reachability data filled
  • once for the artifact from the static reachability "merge results" job, with the reachability data filled.

Requirements

  • Only scan a unique component for vulnerabilities once
  • Only try to create an SBOM occurrence for each unique component once
  • Only update the reachability property for components with it

Affected components

The following components will loop through duplicates if we don't filter out one of the SBOM sets.

  • Dependencies::ExportSerializers::Sbom
  • GitLab::LicenseScanning::PipelineComponents
  • Sbom::CreateVulnerabilitiesService
  • Sbom::Ingestion::IngestReportsService
  • Security::StoreScansService

There is a concern though about the useless resource usage and potential additional traffic on the SBOM related tables which are already suffering from too many updates and recently went through optimizations: Reduce the number of tuples updated for tables ... (&13616 - closed). We should avoid adding additional pressure on the database whenever possible.

Implementation plan

See #500746 (comment 2366288202) for proposal discussion.

Testing

Validate the following functionality works as expected with the reachability feature enabled.

Merge request

  • License scanning works as expected.
  • Security tab works as expected.
  • The SBOM export works as expected

Default branch

  • Dependency list displays dependencies correctly.
  • Vulnerability report shows correct vulnerabilities for CycloneDX SBOM.
  • Licenses for project are correct.
Edited by Oscar Tovar