Skip to content

Show detected licenses with their names (License Scanning SBOM Scanner)

Summary

With the introduction of the License Scanning SBOM Scanner, the License Compliance and Dependency List no longer show the names of the detected licenses; they show the SPDX identifiers instead.

Screenshots

License Compliance page:

New implementation Old implementation
image image

Dependency List page:

New implementation Old implementation
image image

Source: #385173 (closed)

Further details

The following discussion from !109447 (merged) should be addressed:

  • @fcatteau started a discussion:

    Non-blocking, follow-up: This sets the license name to the SPDX identifier. This is incorrect, but it's also necessary, and to my knowledge this has a limited impact.

    • Incorrect: In the UI licenses are presented with SPDX IDs instead of names. See screenshot in #389417 (comment 1256274264).
    • Harmless: To compare licenses and apply policies, SCA::LicenseCompliance always uses IDs; names are used as a fallback. See #diff_with and #reported_license_by_license_model.
    • Necessary: SCA::LicenseCompliance doesn't handle the case where the name isn't set. Again, see #diff_with and #reported_license_by_license_model.

    I see at least two ways we could add names to licenses:

    • Query the software_licenses table used by license policies; it already contains what we need. We could either JOIN with that table or perform an extra query.
    • Add a name columntopm_licenses`. There would be no extra query. License DB should export license names, which is not the case right now.

    At this point I simply suggest we create a follow-up issue to address this problem without pushing any solution. EDIT: Could you create one?

Possible fixes

  • Query the software_licenses table used by license policies; it already contains what we need. We could either JOIN with that table or perform an extra query.

  • Add a name column to the pm_licenses table. There would be no extra query.

    They are at least two possible sources the backend could get the license names from:

    • License DB exports, as part of the sync
    • JSON exports of the SPDX Index
      • handled by the existing ImportSoftwareLicensesWorker
      • handled by a s similar worker

Implementation plan

  1. Update Gitlab::LicenseScanning::PackageLicenses#add_records_with_known_licenses to include license name data from the SoftwareLicense table:

    diff --git a/ee/lib/gitlab/license_scanning/package_licenses.rb b/ee/lib/gitlab/license_scanning/package_licenses.rb
    index 82e6ec06efca..0d0e7dcf5b1f 100644
    --- a/ee/lib/gitlab/license_scanning/package_licenses.rb
    +++ b/ee/lib/gitlab/license_scanning/package_licenses.rb
    @@ -9,6 +9,7 @@ class PackageLicenses
           def initialize(components:)
             @components = components
             @all_records = {}
    +        @license_map = SoftwareLicense.all.to_h { |license| [license.spdx_identifier, license.name] }
           end
    
           def fetch
    @@ -32,15 +33,17 @@ def fetch
    
           private
    
    -      attr_reader :components, :all_records
    +      attr_reader :components, :all_records, :license_map
    
           # Every time a license is encountered for a component, we record it.
           # This allows us to determine which components do not have licenses.
           def add_records_with_known_licenses(records)
             records.each do |purl_type, name, version, licenses|
    +          licenses_with_names = licenses.index_with { |spdx_id| license_map[spdx_id] }
    +
               component_key = File.join(name, version, purl_type)
               all_records[component_key] =
    -            Hashie::Mash.new(purl_type: purl_type, name: name, version: version, licenses: licenses)
    +            Hashie::Mash.new(purl_type: purl_type, name: name, version: version, licenses: licenses_with_names)
             end
           end

    Note: The previous code returns an array of spdx license identifiers, for example: ["OLDAP-2.1", "MIT-5"], while the updated code above returns a hash of spdx license identifiers, for example: { "OLDAP-2.1" => "Open LDAP Public License v2.1", "MIT-5" => nil }. Using a hash is just a suggestion; it might make more sense to simply return the license names and ignore the spdx identifiers, for example: ["Open LDAP Public License v2.1", "MIT-5"].

  2. Update tests in ee/spec/lib/gitlab/license_scanning/package_licenses_spec.rb to handle license hash added by step 1.

  3. Update Gitlab::LicenseScanning::SbomScanner#report to use the name data provided by 1.

  4. Update tests in ee/spec/lib/gitlab/license_scanning/sbom_scanner_spec.rb to handle the code changes in step 3.

Note: Since we're fetching license names from the software_licenses table and joining them using the spdx_identifier column in the pm_licenses table, there might be situations where an spdx_identifier exists in the pm_licenses table, but no corresponding entry (therefore no name value) exists in the software_licenses table. In this situation, we fallback to just using the spdx_identifier as the license name, as shown here.

/cc @gonzoyumo @fcatteau

Edited by Aditya Tiwari