Skip to content

License scanning cannot handle multiple dependencies with the same name and license

Summary

While working on Implement License Scanning SBOM scanner (!109447 - merged), we discovered that the backend cannot accurately track licenses of project dependencies that share the same package name but have different package types, or different package versions.

Steps to reproduce

  1. Create a gl-license-scanning-report.json file with the following contents:

    {
      "version": "2.1",
      "licenses": [
        {
          "id": "MIT",
          "name": "MIT License",
          "url": "https://opensource.org/licenses/MIT"
        }
      ],
      "dependencies": [
        {
          "name": "yargs-parser",
          "version": "1.2.3",
          "package_manager": "bundler",
          "path": "Gemfile.lock",
          "licenses": [
            "MIT"
          ]
        }
      ]
    }
  2. Create another file named gl-license-scanning-report-2.json file with the following contents:

    {
      "version": "2.1",
      "licenses": [
        {
          "id": "MIT",
          "name": "MIT License",
          "url": "https://opensource.org/licenses/MIT"
        }
      ],
      "dependencies": [
        {
          "name": "another-package",
          "version": "2.3.4",
          "package_manager": "npm",
          "path": "package.lock",
          "licenses": [
            "MIT"
          ]
        }
      ]
    }
  3. Create a .gitlab-ci.yml with two jobs which reference the above two license scanning reports:

    job 1:
      script:
        - echo "test"
      artifacts:
        reports:
          license_scanning:
            - gl-license-scanning-report.json
    
    job 2:
      script:
        - echo "test"
      artifacts:
        reports:
          license_scanning:
            - gl-license-scanning-report-2.json
  4. Commit all the above files and wait for the pipeline to finish.

  5. Notice that both packages another-package (2.3.4) and yargs-parser (1.2.3) show up in the license list:

  6. Update the gl-license-scanning-report-2.json file that you created in step 2. above and change "name": "another-package" to "name": "yargs-parser", to introduce a collision with the instance of "name": "yargs-parser" in gl-license-scanning-report.json:

    {
      "version": "2.1",
      "licenses": [
        {
          "id": "MIT",
          "name": "MIT License",
          "url": "https://opensource.org/licenses/MIT"
        }
      ],
      "dependencies": [
        {
          "name": "yargs-parser",
          "version": "2.3.4",
          "package_manager": "npm",
          "path": "package.lock",
          "licenses": [
            "MIT"
          ]
        }
      ]
    }
  7. Commit the above change and wait for the pipeline to finish.

  8. Notice that only the first instance of yargs-parser (1.2.3) now shows up in the license compliance list:

Example Project

https://gitlab.com/adamcohen/license-scanning-collision/-/licenses

What is the current bug behavior?

Only one instance of a dependency with the same name appears in the License list

What is the expected correct behavior?

ALl instances of dependencies with the same name appear in the License list

Further details

The reason for this behaviour is because dependencies are stored in the Gitlab::Ci::Reports::LicenseScanning::License object as a Set of Gitlab::Ci::Reports::LicenseScanning::Dependency objects. When objects are added to a Set, it determines equality of elements using the following method:

Equality of elements is determined according to Object#eql? and Object#hash. Use Set#compare_by_identity to make a set compare its elements by their identity.

If we look at how Gitlab::Ci::Reports::LicenseScanning::Dependency#eql? is implemented, we see that it only considers the name field when deciding whether two objects are the same:

def eql?(other)
  self.name == other.name
end

This is the reason for the bug, since it causes a collision when adding two dependencies with the same name, regardless of whether other values differ. This behaviour can also be confirmed in this spec:

it 'cannot add the same dependency to a set twice' do
  set.add(described_class.new(name: 'bundler'))
  set.add(described_class.new(name: 'bundler'))

  expect(set.count).to eq(1)
end

Possible fixes

  1. Update Gitlab::Ci::Reports::LicenseScanning::Dependency#eql? so it considers more than just the dependency name when determining equality, for example:

    def eql?(other)
      self.name == other.name && self.package_manager == other.package_manager && self.version == other.version
    end
    
  2. Update ::Gitlab::LicenseScanning::SbomScanner#report to set the package_manager attribute when adding dependencies using ::Gitlab::Ci::Reports::LicenseScanning::Dependency#add_dependency.

  3. Add tests to confirm the behaviour works as expected

/cc @gonzoyumo @fcatteau

Edited by Fabien Catteau