Skip to content

Improve performance of PackageLicenses#fetch

What does this MR do and why?

This MR updates Gitlab::LicenseScanning::PackageLicenses#fetch to implement the optimization described here.

Data characteristics

Raw SQL

  1. query 1

    WITH needed_package_versions ( purl_type, name, version ) AS (
        VALUES
            ( 4, 'beego', 'v1.10.0' ),
            ( 6, 'camelcase', '1.2.1' ),
            ( 6, 'camelcase', '4.1.0' ),
            ( 6, 'cliui', '2.1.0' ),
            ( 4, 'cliui', '2.1.0' )
    )
    SELECT DISTINCT
        pm_packages.id,
        needed_package_versions.version
    FROM
        needed_package_versions
        JOIN pm_packages ON pm_packages.purl_type = needed_package_versions.purl_type AND
        pm_packages.name = needed_package_versions.name;
  2. query 2, using the pm_packages.id, needed_package_versions.version values from query 1 above:

    SELECT
        pm_packages.purl_type,
        pm_packages.name,
        pm_package_versions.version,
        array_agg( pm_licenses.spdx_identifier )
    FROM
        pm_packages
        JOIN pm_package_versions ON pm_package_versions.pm_package_id = pm_packages.id
        JOIN pm_package_version_licenses ON pm_package_version_licenses.pm_package_version_id = pm_package_versions.id
        JOIN pm_licenses ON pm_licenses.id = pm_package_version_licenses.pm_license_id
    WHERE
        (
            pm_package_versions.pm_package_id,
            pm_package_versions.version
        ) IN (
            ( 386, 'v1.10.0' ),
            ( 387, '1.2.1' ),
            ( 387, '4.1.0' ),
            ( 388, '2.1.0' ),
            ( 389, '2.1.0' )
        )
    GROUP BY
        pm_packages.purl_type,
        pm_packages.name,
        pm_package_versions.version;

Query plan and benchmarks

Please see this internal note for query plan and this comment for further details.

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Related to #398679 (closed)

Edited by Adam Cohen

Merge request reports