Skip to content

Update package metadata license lookup to use deduplicated data

Why are we doing this work

A new compressed dataset will be added as part of this issue's parent epic. The compressed dataset is meant to replace the existing dataset (though not in this epic). Therefore, the current license lookup provided by Gitlab::LicenseScanning::PackageLicenses needs to be updated to query this dataset.

Note: until existing dataset is removed, Gitlab::LicenseScanning::PackageLicenses we need to support querying both compressed and uncompressed data.

Relevant links

Implementation plan

Update Gitlab::LicenseScanning::PackageLicenses:

  1. Add a new feature flag called compressed_package_metadata_query.

    [Feature flag] Rollout of `compressed_package_m... (#409793 - closed)

  2. Create two new private methods:

    • uncompressed_fetch

      This contains the code currently in the fetch method

    • compressed_fetch

      This is responsible for querying data from the new licenses field in the pm_packages table using components using the following pseudocode:

      def compressed_fetch
        components.each do |component|
          packages = select packages from pm_packages table where pm_packages.name = component.name and pm_packages.purl_type = component.purl_type
      
          packages.each do |package|
            licenses = []
            if component.version is contained in package.licenses.other_versions
              licenses = package.licenses.other_licenses
            else
              licenses = package.licenses.default_licenses
            end
      
            add_record_with_known_licenses(package.purl_type, package.name, component.version, licenses)
          end
        end
      
        add_records_with_unknown_licenses
      end

    Update package metadata license lookup to use c... (!119607 - merged)

  3. update fetch:

    if the `compressed_package_metadata_query` feature is enabled
      call `compressed_fetch`
    else
      call `uncompressed_fetch`
    end

    Update package metadata license lookup to use c... (!119607 - merged)

  4. Update all tests to check both sides of compressed_package_metadata_query feature flag:

    Test both sides of compressed_package_metadata_... (!120207 - merged)

Verification steps

Test data to be determined.

  • redundant case
    1. insert same test data into both datasets
    2. assert that uncompressed_fetch and compressed_fetch return the same result, regardless of the setting for compressed_package_metadata_query
  • new instance case
    1. insert only into new dataset
    2. assert that fetch returns data correctly when compressed_package_metadata_query is enabled
Edited by Adam Cohen