Skip to content

Add batched migration to remove namespace from OS components

Oscar Tovar requested to merge otovar/fixed-os-component-name-migration into master

What does this MR do and why?

Add batched migration to remove namespace from OS components

When we first started ingesting OS components from Trivy SBOMs, we did not remove the namepaces from the component names. This meant that we did not match advisories because a component name in the advisory data was listed without the namespace prefix.

For example, we'd get the cURL component for an Alpine Linux image as pkg:apk/alpine/curl@<version>, which we'd ultimately parse as alpine/curl. In this example, the advisories for cURL would have a name like curl instead of alpine/curl. Thus, we'd never get a match on the advisories and components when performing advisory scans.

The PURL specification dictates that alpine/curl is the correct name, but in our case we choose to deviate from this because store the operating system metadata as an SBOM property instead. All this considered, the most pragmatic solution here, and thus the one taken, is to migrate all of our components to drop the namespace/os prefixes instead of re-exporting all of our advisories from the GitLab Package Metadata Database (a much larger change).

Relates to Remove distro prefix from OS component names (#442847) • Oscar Tovar • 17.1 • Needs attention

Stats

-- Records we're expecting to migrate
SELECT
    COUNT("sbom_components".id) records_to_migrate_cnt
FROM
    sbom_components
WHERE
    purl_type > 8
    AND purl_type < 14
    AND component_type = 0
    AND name LIKE '%/%';

 records_to_migrate_cnt
------------------------
                  22376
-- Records we're expecting to raise a not unique error
WITH components_to_migrate AS (
    SELECT
        id,
        "sbom_components".name AS full_name,
        SPLIT_PART("sbom_components".name, '/', 1) AS namespace,
        SPLIT_PART("sbom_components".name, '/', 2) AS name
    FROM
        sbom_components
    WHERE
        purl_type > 8
        AND purl_type < 14
        AND component_type = 0
        AND name LIKE '%/%'
)
SELECT
    COUNT(DISTINCT "sbom_components".id) AS approx_expected_record_uniq_errors
FROM
    sbom_components
    INNER JOIN components_to_migrate ON "sbom_components".name = "components_to_migrate".name
WHERE
    "sbom_components".purl_type > 8
    AND "sbom_components".purl_type < 14
    AND "sbom_components".component_type = 0;

 approx_expected_record_uniq_errors
------------------------------------
                              10920
(1 row)
-- Prefixes that we'll be removing. This serves as proof that we won't corrupt data,
-- and are only removing the os prefixes.
SELECT DISTINCT
    SPLIT_PART("sbom_components".name, '/', 1) AS os_prefix
FROM
    sbom_components
WHERE
    purl_type > 8
    AND purl_type < 14
    AND component_type = 0
    AND name LIKE '%/%'
ORDER BY
    os_prefix;

      os_prefix
---------------------
 alma
 alpine
 amazon
 centos
 debian
 fedora
 opensuse.leap
 opensuse.tumbleweed
 oracle
 photon
 redhat
 rocky
 sles
 ubuntu
(14 rows)
-- This serves as proof that os components don't support nested namespaces.
-- We don't have a component that has the `<os>/<nested-namespace>/<name>` format.
-- I checked this to make sure that this wasn't a possibility, and something we'd miss.
SELECT
    COUNT(id)
FROM
    sbom_components
WHERE
    purl_type > 8
    AND purl_type < 14
    AND component_type = 0
    AND name LIKE '%/%/%';

 count
-------
     0
(1 row)

MR acceptance checklist

Please evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.

Screenshots or screen recordings

Screenshots are required for UI changes, and strongly recommended for all other merge requests.

Before After

Production Data

Tested in postgres.ai

 Index Scan using index_sbom_components_on_component_type_name_and_purl_type on public.sbom_components  (cost=0.42..53319.75 rows=38665 width=83) (actual time=2.787..82.102 rows=22376 loops=1)
   Index Cond: (sbom_components.purl_type = ANY ('{9,10,11,12,13}'::integer[]))
   Filter: (sbom_components.name ~~ '%/%'::text)
   Rows Removed by Filter: 12971
   Buffers: shared hit=41342
   I/O Timings: read=0.000 write=0.000

An approximate total of 22376 rows match this condition.

How to set up and validate locally

Numbered steps to set up and validate the change are strongly suggested.

Edited by Oscar Tovar

Merge request reports