Improve DB schema to better support CVS and the Dependency List
Problem to solve
The sbom_*
tables and related Sbom::*
models were introduced so that SBOM ingestion persists all the data needed to support two distinct features:
- Continuous Vulnerability Scanning
- Dependency List (visualization and export)
(As of today License Scanning uses SBOM report artifacts directly.)
However, over time SBOM ingestion and the sbom_*
table were optimized for one feature or the other.
- Optimization for CVS
- Only track
library
components with a PURL type supported by CVS. See https://gitlab.com/gitlab-org/gitlab/-/blob/c75baaf98ccaea7f04b24d886c01620186584864/lib/gitlab/ci/reports/sbom/component.rb#L26 and https://gitlab.com/gitlab-org/gitlab/-/blob/5ea198a046119af933f1e608cd51b39b59b02004/app/models/concerns/enums/sbom.rb - Store a component
name
that can be directly compared by CVS insbom_components.name
(now repeated insbom_occurrences.component_name
), instead of the raw name coming from the CycloneDX SBOM. #388780 (closed)
- Only track
- Optimization for the Dependency List
- De-normalization: Repeat
input_file_path
andpackage_manager
ofsbom_sources
table insbom_occurrences
. - De-normalization: Repeat
sbom_sources.name
insbom_occurrences.component_name
. - Persist License Scanning results in
sbom_occurrences
. - Persist Vulnerability Scanning results in
sbom_occurrences_vulnerabilities
.
- De-normalization: Repeat
This impacts negatively both features.
- Dependency List
- Components with a type or PURL type not supported by CVS are simply ignored. The SBOM might be incomplete.
- Component names don't reflect what's in the SBOM. The SBOM might be inaccurate.
- CVS
- The schema can't be easily optimized for CVS queries. TODO: provide examples.
- Versions accurately reflect what's in the SBOM, but they should be sanitized to be efficiently compared to the advisory DB by CVS. Also, having raw version strings isn't consistent with having normalized and sanitized component names.
Proposal
Change the DB schema to achieve the following:
- The Dependency List and the CycloneDX SBOM are accurate and complete.
- DB tables and indexes can be optimized for CVS without impacting the Dependency List negatively, and the other way around.
-
Sbom:Occurrence
can accurately track a binary package and its source package w/o introducing any extra cost. #427095 (closed)
Proposal A
During SBOM ingestion, normalize and persist properties needed for CVS in a dedicated table,
and link them to the corresponding Sbom::Occurrence
model using a foreign key.
SBOM components not supported by CVS would have records in the existing sbom_occurrences
table,
but not in the new table dedicated to CVS queries.
Proposal B
- Share
PackageMetadata::Package
b/w license data and advisory data. We assume that license data cover packages that might get advisories in the future. - Add a relation table to link
Sbom::Occurrence
models toPackageMetadata::Package
, and create relations during SBOM ingestion.
Pros compared to proposal A
- Save on storage.
Cons compared to proposal A
- It introduces some coupling b/w license data and advisory data.
- During advisory ingestion, CVS can't create vulnerabilities for
Sbom::Occurrence
unless the correspondingPackageMetadata::Package
already exist. - This seems against the isolation b/w package metadata and SBOM data discussed in gitlab_schema for package metadata used by Vuln... (#378261 - closed).