Skip to content

Allow filtering the dependency list by unknown license

Brian Williams requested to merge bwill/search-by-unknown-license into master

What does this MR do and why?

Allow for unknown licenses to be searched for on the dependency list via the licenses[]=unknown query string parameter. A dependency's license is "unknown" if it does not have any licenses associated with it. Licenses are currently stored in a jsonb column sbom_occurrences.licenses which consists of an array of objects. When the user wants to see unknown licenses, we search for sbom_occurrences where the licenses column is an empty array.

Query

The existing query is known to perform poorly. See !132304 (comment 1579176970) for the discussion where it was introduced. We plan to address this via multiple improvements in &12371.

Before:

WITH our_occurrences AS MATERIALIZED (
    SELECT
        DISTINCT ON ( sbom_occurrences.component_version_id )
        sbom_occurrences.*,
        count( sbom_occurrences.id ) OVER ( PARTITION BY sbom_occurrences.component_version_id ) AS occurrence_count,
        dense_rank() OVER (
            PARTITION BY sbom_occurrences.component_version_id
            ORDER BY project_id
        ) AS project_count
    FROM
        sbom_occurrences
    WHERE
        sbom_occurrences.project_id IN (
            SELECT
                projects.id
            FROM
                projects
            WHERE
                projects.namespace_id IN (
                    SELECT
                        namespaces.id
                    FROM
                        unnest( COALESCE( ( SELECT ids FROM ( SELECT namespace_descendants.self_and_descendant_group_ids AS ids FROM namespace_descendants WHERE namespace_descendants.outdated_at IS NULL AND namespace_descendants.namespace_id = 9970 ) AS cached_query ), ( SELECT ids FROM ( SELECT array_agg( namespaces.id ) AS ids FROM ( SELECT namespaces.traversal_ids[array_length( namespaces.traversal_ids, 1 )] AS id FROM namespaces WHERE namespaces.type = 'Group' AND traversal_ids @> '{9970}' ) AS namespaces ) AS consistent_query ) ) ) AS namespaces ( id )
                ) AND
                projects.marked_for_deletion_at IS NULL AND
                projects.pending_delete = false
        ) AND
        sbom_occurrences.component_version_id IS NOT NULL
    ORDER BY
        sbom_occurrences.component_version_id DESC,
        project_count DESC,
        sbom_occurrences.id DESC
)
SELECT
    id,
    created_at,
    updated_at,
    component_version_id,
    project_id,
    pipeline_id,
    source_id,
    commit_sha,
    component_id,
    uuid,
    package_manager,
    component_name,
    input_file_path,
    licenses,
    highest_severity,
    vulnerability_count,
    source_package_id,
    occurrence_count,
    project_count
FROM
    our_occurrences AS sbom_occurrences
WHERE
    ( licenses #> '{0,spdx_identifier}' ) ?| ARRAY[ 'Apache-2.0' ] OR
    ( licenses #> '{1,spdx_identifier}' ) ?| ARRAY[ 'Apache-2.0' ]
ORDER BY
    id ASC;

https://console.postgres.ai/gitlab/gitlab-production-tunnel-pg12/sessions/25787/commands/81283

After:

WITH our_occurrences AS MATERIALIZED (
    SELECT
        DISTINCT ON ( sbom_occurrences.component_version_id )
        sbom_occurrences.*,
        count( sbom_occurrences.id ) OVER ( PARTITION BY sbom_occurrences.component_version_id ) AS occurrence_count,
        dense_rank() OVER (
            PARTITION BY sbom_occurrences.component_version_id
            ORDER BY project_id
        ) AS project_count
    FROM
        sbom_occurrences
    WHERE
        sbom_occurrences.project_id IN (
            SELECT
                projects.id
            FROM
                projects
            WHERE
                projects.namespace_id IN (
                    SELECT
                        namespaces.id
                    FROM
                        unnest( COALESCE( ( SELECT ids FROM ( SELECT namespace_descendants.self_and_descendant_group_ids AS ids FROM namespace_descendants WHERE namespace_descendants.outdated_at IS NULL AND namespace_descendants.namespace_id = 9970 ) AS cached_query ), ( SELECT ids FROM ( SELECT array_agg( namespaces.id ) AS ids FROM ( SELECT namespaces.traversal_ids[array_length( namespaces.traversal_ids, 1 )] AS id FROM namespaces WHERE namespaces.type = 'Group' AND traversal_ids @> '{9970}' ) AS namespaces ) AS consistent_query ) ) ) AS namespaces ( id )
                ) AND
                projects.marked_for_deletion_at IS NULL AND
                projects.pending_delete = false
        ) AND
        sbom_occurrences.component_version_id IS NOT NULL
    ORDER BY
        sbom_occurrences.component_version_id DESC,
        project_count DESC,
        sbom_occurrences.id DESC
)
SELECT
    id,
    created_at,
    updated_at,
    component_version_id,
    project_id,
    pipeline_id,
    source_id,
    commit_sha,
    component_id,
    uuid,
    package_manager,
    component_name,
    input_file_path,
    licenses,
    highest_severity,
    vulnerability_count,
    source_package_id,
    occurrence_count,
    project_count
FROM
    our_occurrences AS sbom_occurrences
WHERE
    ( licenses #> '{0,spdx_identifier}' ) ?| ARRAY[ 'Apache-2.0' ] OR
    ( licenses #> '{1,spdx_identifier}' ) ?| ARRAY[ 'Apache-2.0' ] OR
    licenses = '[]'
ORDER BY
    id ASC;

https://console.postgres.ai/gitlab/gitlab-production-tunnel-pg12/sessions/25787/commands/81284

MR acceptance checklist

Please evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.

Screenshots or screen recordings

Screenshots are required for UI changes, and strongly recommended for all other merge requests.

Before After
Screenshot_2024-01-25_at_1.18.11_PM Screenshot_2024-01-25_at_1.17.39_PM

How to set up and validate locally

Numbered steps to set up and validate the change are strongly suggested.

  1. Find or create a group with unknown licenses in the dependency list
  2. Go to <gdk_url>/groups/<group_path>/-/dependencies.json?licenses[]=unknown
  3. Licenses should appear in the list
Edited by Mehmet Emin INAC

Merge request reports