License scanning cannot handle multiple dependencies with the same name and license
Summary
While working on Implement License Scanning SBOM scanner (!109447 - merged), we discovered that the backend cannot accurately track licenses of project dependencies that share the same package name but have different package types, or different package versions.
Steps to reproduce
-
Create a
gl-license-scanning-report.jsonfile with the following contents:{ "version": "2.1", "licenses": [ { "id": "MIT", "name": "MIT License", "url": "https://opensource.org/licenses/MIT" } ], "dependencies": [ { "name": "yargs-parser", "version": "1.2.3", "package_manager": "bundler", "path": "Gemfile.lock", "licenses": [ "MIT" ] } ] } -
Create another file named
gl-license-scanning-report-2.jsonfile with the following contents:{ "version": "2.1", "licenses": [ { "id": "MIT", "name": "MIT License", "url": "https://opensource.org/licenses/MIT" } ], "dependencies": [ { "name": "another-package", "version": "2.3.4", "package_manager": "npm", "path": "package.lock", "licenses": [ "MIT" ] } ] } -
Create a
.gitlab-ci.ymlwith two jobs which reference the above two license scanning reports:job 1: script: - echo "test" artifacts: reports: license_scanning: - gl-license-scanning-report.json job 2: script: - echo "test" artifacts: reports: license_scanning: - gl-license-scanning-report-2.json -
Commit all the above files and wait for the pipeline to finish.
-
Notice that both packages
another-package (2.3.4) and yargs-parser (1.2.3)show up in the license list:
-
Update the
gl-license-scanning-report-2.jsonfile that you created in step2.above and change"name": "another-package"to"name": "yargs-parser", to introduce a collision with the instance of"name": "yargs-parser"ingl-license-scanning-report.json:{ "version": "2.1", "licenses": [ { "id": "MIT", "name": "MIT License", "url": "https://opensource.org/licenses/MIT" } ], "dependencies": [ { "name": "yargs-parser", "version": "2.3.4", "package_manager": "npm", "path": "package.lock", "licenses": [ "MIT" ] } ] } -
Commit the above change and wait for the pipeline to finish.
-
Notice that only the first instance of
yargs-parser (1.2.3)now shows up in the license compliance list:
Example Project
https://gitlab.com/adamcohen/license-scanning-collision/-/licenses
What is the current bug behavior?
Only one instance of a dependency with the same name appears in the License list
What is the expected correct behavior?
ALl instances of dependencies with the same name appear in the License list
Further details
The reason for this behaviour is because dependencies are stored in the Gitlab::Ci::Reports::LicenseScanning::License object as a Set of Gitlab::Ci::Reports::LicenseScanning::Dependency objects. When objects are added to a Set, it determines equality of elements using the following method:
Equality of elements is determined according to Object#eql? and Object#hash. Use Set#compare_by_identity to make a set compare its elements by their identity.
If we look at how Gitlab::Ci::Reports::LicenseScanning::Dependency#eql? is implemented, we see that it only considers the name field when deciding whether two objects are the same:
def eql?(other)
self.name == other.name
end
This is the reason for the bug, since it causes a collision when adding two dependencies with the same name, regardless of whether other values differ. This behaviour can also be confirmed in this spec:
it 'cannot add the same dependency to a set twice' do
set.add(described_class.new(name: 'bundler'))
set.add(described_class.new(name: 'bundler'))
expect(set.count).to eq(1)
end
Possible fixes
-
Update Gitlab::Ci::Reports::LicenseScanning::Dependency#eql? so it considers more than just the dependency
namewhen determining equality, for example:def eql?(other) self.name == other.name && self.package_manager == other.package_manager && self.version == other.version end -
Update
::Gitlab::LicenseScanning::SbomScanner#reportto set thepackage_managerattribute when adding dependencies using::Gitlab::Ci::Reports::LicenseScanning::Dependency#add_dependency. -
Add tests to confirm the behaviour works as expected
/cc @gonzoyumo @fcatteau

