Remove distro prefix from OS component names
Summary
The SBOM ingestion pipeline sources the component name using the namespace and the name. This behavior was added to fix an issue with Maven components not matching the intended licenses and vulnerabilities as described in Store PURL namespace of SBOM components (#388780 - closed) • Igor Frenkel • 16.3 • On track. Unfortunately, the opposite is desired for OS type packages which include the following PURL types:
apk
deb
rpm
Steps to reproduce
- Ingest an SBOM sourced from a Trivy analyzer or container-scanning.
- Make sure you have the
project_level_sbom_occurrences
flag enabled if before %17.0. - Go to the dependency list, and observe that the component names show the prefix e.g.
debian/curl
oralpine/curl
.
Example Project
What is the current bug behavior?
You see OS components like debian/libext2fs2
.
What is the expected correct behavior?
The debian/
prefix should not be there or in the sbom_occurrences.name
column.
Relevant logs and/or screenshots
Implementation plan
- MR: Update the name method so that it returns only @name when it has a purl type
apk
,deb
, orrpm
.- Remove the
container_scanning_component?
exclusive logic. Thename
method will now function as intended. - MR: Create a background migration that fixes the SBOM component names, vulnerabilities, vulnerability occurrences and findings. The SBOM component names have a unique index constraint:
index_sbom_components_on_component_type_name_and_purl_type
so we'll have to handle theActiveRecord::RecordNotUnique
exceptions that are raised during the migration. For all these records, we'll need to migrate the foreign keys to use the ID that belongs to the component the correct name.-
We'll need to ensure that the
security_findings.finding_data
andvulnerability_occurrences.details
contains the correct component name in the JSONB data. The transformation should look like this:// Before {"vulnerable_package": {"name": "Vulnerable Package", "type": "text", "value": "debian/squid:5.7-2"}} // After {"vulnerable_package": {"name": "Vulnerable Package", "type": "text", "value": "squid:5.7-2"}}
-
The
security_findings.raw_metadata
also has a copy of the name and as a result we'll need to fix it as well. Example:// Before {"message":"[SQUID-2023:7 Denial of Service in HTTP Message processing]","description":"","solution":"","location":{"image":"registry.gitlab.com/hacks4oats/426817-debian-base-project/main:21da623a43584925ab6ec384ba75605afd3dbd16","operating_system":"debian 12.4","dependency":{"package":{"name":"debian/squid"},"version":"5.7-2"}}} // After {"message":"[SQUID-2023:7 Denial of Service in HTTP Message processing]","description":"","solution":"","location":{"image":"registry.gitlab.com/hacks4oats/426817-debian-base-project/main:21da623a43584925ab6ec384ba75605afd3dbd16","operating_system":"debian 12.4","dependency":{"package":{"name":"squid"},"version":"5.7-2"}}}
-
Lastly the
location_fingerprint
will need to be updated. The UUID of the vulnerability occurrence (finding) will also need to be updated since it uses the location fingerprint. Fingerprints are a SHA1 digest constructed from the docker image without tag and the package name. For example:# Pseudocode of how to perform migration # Before old_fingerprint = Digest::SHA1.hexdigest("debian/squid") => "2996cf7a148b978e651506aaea3f2d60bb578c97" # After new_fingerprint = Digest::SHA1.hexdigest("squid") => "c46257baa1b7db7868b14d029d37c20bf070d7b2" finding.fingerprint_location = new_fingerprint # Update the UUID uuid = ::Security::VulnerabilityUUID.generate( report_type: 'container_scanning', primary_identifier_fingerprint: finding.primary_identifier.fingerprint, location_fingerprint: new_fingerprint, project_id: finding.project_id, ) finding.uuid = uuid finding.update
-
- Remove the
- MR: Only needed if we log instances where the previous migration ran into a
RecordNotUnique
error. Migrate the components that violated the unique constraint index. The process for fixing this will look like the following:-
Find the ID of the component with the correct name. For example, let's say that we have the following data in the
sbom_components
table.| id | name | purl_type | | -- | ----------- | --------- | | 1 | debian/curl | 11 | | 2 | curl | 11 |
We can find them with a query like the following:
scope_to ->(relation) { relation.where(purl_type: [9, 10, 11]).where("name LIKE ?", "%/%") }
-
The correct ID in this case is
2
, thus we'll need to do an atomic update where we set the SBOM occurrences and SBOM component versionscomponent_id
column to2
when it's1
.
-
- MR: Only needed if we log instances where the previous migration ran into a
RecordNotUnique
error. Create migration to delete the components that we migrated the SBOM occurrences and component versions off of.