Show detected licenses with their names (License Scanning SBOM Scanner)
Summary
With the introduction of the License Scanning SBOM Scanner, the License Compliance
and Dependency List
no longer show the names of the detected licenses; they show the SPDX identifiers instead.
Screenshots
License Compliance
page:
New implementation | Old implementation |
---|---|
![]() |
![]() |
Dependency List
page:
New implementation | Old implementation |
---|---|
![]() |
![]() |
Source: #385173 (closed)
Further details
The following discussion from !109447 (merged) should be addressed:
-
@fcatteau started a discussion: Non-blocking, follow-up: This sets the license
name
to the SPDX identifier. This is incorrect, but it's also necessary, and to my knowledge this has a limited impact.- Incorrect: In the UI licenses are presented with SPDX IDs instead of names. See screenshot in #389417 (comment 1256274264).
- Harmless: To compare licenses and apply policies,
SCA::LicenseCompliance
always uses IDs; names are used as a fallback. See#diff_with
and#reported_license_by_license_model
. - Necessary:
SCA::LicenseCompliance
doesn't handle the case where thename
isn't set. Again, see#diff_with
and#reported_license_by_license_model
.
I see at least two ways we could add names to licenses:
- Query the
software_licenses
table used by license policies; it already contains what we need. We could either JOIN with that table or perform an extra query. - Add a
name
columnto
pm_licenses`. There would be no extra query. License DB should export license names, which is not the case right now.
At this point I simply suggest we create a follow-up issue to address this problem without pushing any solution. EDIT: Could you create one?
Possible fixes
-
Query the
software_licenses
table used by license policies; it already contains what we need. We could either JOIN with that table or perform an extra query. -
Add a
name
column to thepm_licenses
table. There would be no extra query.They are at least two possible sources the backend could get the license names from:
- License DB exports, as part of the sync
- JSON exports of the SPDX Index
- handled by the existing
ImportSoftwareLicensesWorker
- handled by a s similar worker
- handled by the existing
Implementation plan
-
Update Gitlab::LicenseScanning::PackageLicenses#add_records_with_known_licenses to include license name data from the SoftwareLicense
table:diff --git a/ee/lib/gitlab/license_scanning/package_licenses.rb b/ee/lib/gitlab/license_scanning/package_licenses.rb index 82e6ec06efca..0d0e7dcf5b1f 100644 --- a/ee/lib/gitlab/license_scanning/package_licenses.rb +++ b/ee/lib/gitlab/license_scanning/package_licenses.rb @@ -9,6 +9,7 @@ class PackageLicenses def initialize(components:) @components = components @all_records = {} + @license_map = SoftwareLicense.all.to_h { |license| [license.spdx_identifier, license.name] } end def fetch @@ -32,15 +33,17 @@ def fetch private - attr_reader :components, :all_records + attr_reader :components, :all_records, :license_map # Every time a license is encountered for a component, we record it. # This allows us to determine which components do not have licenses. def add_records_with_known_licenses(records) records.each do |purl_type, name, version, licenses| + licenses_with_names = licenses.index_with { |spdx_id| license_map[spdx_id] } + component_key = File.join(name, version, purl_type) all_records[component_key] = - Hashie::Mash.new(purl_type: purl_type, name: name, version: version, licenses: licenses) + Hashie::Mash.new(purl_type: purl_type, name: name, version: version, licenses: licenses_with_names) end end
Note: The previous code returns an array of spdx license identifiers, for example:
["OLDAP-2.1", "MIT-5"]
, while the updated code above returns a hash of spdx license identifiers, for example:{ "OLDAP-2.1" => "Open LDAP Public License v2.1", "MIT-5" => nil }
. Using a hash is just a suggestion; it might make more sense to simply return the license names and ignore the spdx identifiers, for example:["Open LDAP Public License v2.1", "MIT-5"]
. -
Update tests in ee/spec/lib/gitlab/license_scanning/package_licenses_spec.rb to handle license hash added by step 1.
-
Update Gitlab::LicenseScanning::SbomScanner#report to use the name data provided by 1.
-
Update tests in ee/spec/lib/gitlab/license_scanning/sbom_scanner_spec.rb to handle the code changes in step 3.
Note: Since we're fetching license names from the software_licenses
table and joining them using the spdx_identifier
column in the pm_licenses
table, there might be situations where an spdx_identifier
exists in the pm_licenses
table, but no corresponding entry (therefore no name
value) exists in the software_licenses
table. In this situation, we fallback to just using the spdx_identifier
as the license name, as shown here.
/cc @gonzoyumo @fcatteau