Show detected licenses with their names (License Scanning SBOM Scanner)
Summary
With the introduction of the License Scanning SBOM Scanner, the License Compliance and Dependency List no longer show the names of the detected licenses; they show the SPDX identifiers instead.
Screenshots
License Compliance page:
| New implementation | Old implementation |
|---|---|
![]() |
![]() |
Dependency List page:
| New implementation | Old implementation |
|---|---|
![]() |
![]() |
Source: #385173 (closed)
Further details
The following discussion from !109447 (merged) should be addressed:
-
@fcatteau started a discussion: Non-blocking, follow-up: This sets the license
nameto the SPDX identifier. This is incorrect, but it's also necessary, and to my knowledge this has a limited impact.- Incorrect: In the UI licenses are presented with SPDX IDs instead of names. See screenshot in #389417 (comment 1256274264).
- Harmless: To compare licenses and apply policies,
SCA::LicenseCompliancealways uses IDs; names are used as a fallback. See#diff_withand#reported_license_by_license_model. - Necessary:
SCA::LicenseCompliancedoesn't handle the case where thenameisn't set. Again, see#diff_withand#reported_license_by_license_model.
I see at least two ways we could add names to licenses:
- Query the
software_licensestable used by license policies; it already contains what we need. We could either JOIN with that table or perform an extra query. - Add a
namecolumntopm_licenses`. There would be no extra query. License DB should export license names, which is not the case right now.
At this point I simply suggest we create a follow-up issue to address this problem without pushing any solution. EDIT: Could you create one?
Possible fixes
-
Query the
software_licensestable used by license policies; it already contains what we need. We could either JOIN with that table or perform an extra query. -
Add a
namecolumn to thepm_licensestable. There would be no extra query.They are at least two possible sources the backend could get the license names from:
- License DB exports, as part of the sync
- JSON exports of the SPDX Index
- handled by the existing
ImportSoftwareLicensesWorker - handled by a s similar worker
- handled by the existing
Implementation plan
-
Update Gitlab::LicenseScanning::PackageLicenses#add_records_with_known_licenses to include license name data from the SoftwareLicensetable:diff --git a/ee/lib/gitlab/license_scanning/package_licenses.rb b/ee/lib/gitlab/license_scanning/package_licenses.rb index 82e6ec06efca..0d0e7dcf5b1f 100644 --- a/ee/lib/gitlab/license_scanning/package_licenses.rb +++ b/ee/lib/gitlab/license_scanning/package_licenses.rb @@ -9,6 +9,7 @@ class PackageLicenses def initialize(components:) @components = components @all_records = {} + @license_map = SoftwareLicense.all.to_h { |license| [license.spdx_identifier, license.name] } end def fetch @@ -32,15 +33,17 @@ def fetch private - attr_reader :components, :all_records + attr_reader :components, :all_records, :license_map # Every time a license is encountered for a component, we record it. # This allows us to determine which components do not have licenses. def add_records_with_known_licenses(records) records.each do |purl_type, name, version, licenses| + licenses_with_names = licenses.index_with { |spdx_id| license_map[spdx_id] } + component_key = File.join(name, version, purl_type) all_records[component_key] = - Hashie::Mash.new(purl_type: purl_type, name: name, version: version, licenses: licenses) + Hashie::Mash.new(purl_type: purl_type, name: name, version: version, licenses: licenses_with_names) end endNote: The previous code returns an array of spdx license identifiers, for example:
["OLDAP-2.1", "MIT-5"], while the updated code above returns a hash of spdx license identifiers, for example:{ "OLDAP-2.1" => "Open LDAP Public License v2.1", "MIT-5" => nil }. Using a hash is just a suggestion; it might make more sense to simply return the license names and ignore the spdx identifiers, for example:["Open LDAP Public License v2.1", "MIT-5"]. -
Update tests in ee/spec/lib/gitlab/license_scanning/package_licenses_spec.rb to handle license hash added by step 1. -
Update Gitlab::LicenseScanning::SbomScanner#report to use the name data provided by 1. -
Update tests in ee/spec/lib/gitlab/license_scanning/sbom_scanner_spec.rb to handle the code changes in step 3.
Note: Since we're fetching license names from the software_licenses table and joining them using the spdx_identifier column in the pm_licenses table, there might be situations where an spdx_identifier exists in the pm_licenses table, but no corresponding entry (therefore no name value) exists in the software_licenses table. In this situation, we fallback to just using the spdx_identifier as the license name, as shown here.
/cc @gonzoyumo @fcatteau



