SBOM reports with duplicate components cannot be ingested

Summary

We discovered in this thread that license scanning was reporting 0 licenses for an SBOM file that contained duplicate entries.

We should either strip out duplicate entries when ingesting an SBOM, or present the ingestion errors in the UI.

One side effect of this bug is an empty License column in the Dependency List page.

  • As of today, the components listed on that page are extracted from the Dependency Scanning report.
  • However, the SBOM isn't ingested. The parsing silently fails.
  • As a result, the License Scanning SBOM Scanner finds no licenses.

See #394985 (comment 1489841114)

Steps to reproduce

  1. Create a new project with the following files:

    • .gitlab-ci.yml

      cyclonedx-reports:
        stage: test
        variables:
          GIT_STRATEGY: "fetch"
        script:
          - echo "test"
        artifacts:
          paths:
            - "**/gl-sbom-*.cdx.json"
          reports:
            cyclonedx: "gl-sbom-*.cdx.json"
    • gl-sbom-all.cdx.json

      {
        "bomFormat": "CycloneDX",
        "specVersion": "1.4",
        "version": 1,
        "metadata": {
          "tools": [
            {
              "vendor": "GitLab",
              "name": "Gemnasium",
              "version": "3.11.3"
            }
          ]
        },
        "components": [
          {
            "type": "library",
            "bom-ref": "pkg:maven/com.eclipsesource.minimal-json/minimal-json@0.9.5",
            "name": "com.eclipsesource.minimal-json/minimal-json",
            "version": "0.9.5",
            "purl": "pkg:maven/com.eclipsesource.minimal-json/minimal-json@0.9.5"
          },
          {
            "type": "library",
            "bom-ref": "pkg:maven/com.eclipsesource.minimal-json/minimal-json@0.9.5",
            "name": "com.eclipsesource.minimal-json/minimal-json",
            "version": "0.9.5",
            "purl": "pkg:maven/com.eclipsesource.minimal-json/minimal-json@0.9.5"
          }
        ]
      }   
  2. Run a pipeline for the project.

  3. Notice there are no licenses reported in the licenses tab of the pipeline page.

  4. Look at the sbom_reports.reports for the pipeline using the rails console in production:

    [ gprd ] production> Ci::Pipeline.find(798010659).sbom_reports.reports
    
    => [#<Gitlab::Ci::Reports::Sbom::Report:0x00007f5b7f240360 @components=[], @errors=["property '/components' is invalid: error_type=uniqueItems"]>]

    Notice the error message: "property '/components' is invalid: error_type=uniqueItems"

Example Project

https://gitlab.com/adamcohen/maven-sbom-license-test/-/pipelines/798010659/licenses

What is the current bug behavior?

No licenses are displayed in the licenses tab of the pipeline page.

What is the expected correct behavior?

Licenses should be displayed in the licenses tab of the pipeline page, or the UI should display an error message explaining that the components couldn't be ingested.

Root cause

This issue occurs because duplicate SBOM components cause a validation error when ingesting the report due to the JSON schema validation performed on the CycloneDX document beforehand. The JSON schema validation uses this schema which was copied directly from the CycloneDX specification repository.

Possible fixes

We can possibly solve this using one of the following approaches:

  1. Ignore duplicates when ingesting SBOM components. This will allow licenses to show up in the UI, but will not give any indication to the user that the SBOM file contained invalid data.
  2. Display errors in the UI when invalid SBOM data is encountered. We could even provide a link to the troubleshooting docs for A CycloneDX file is not being scanned and appears to provide no results if we determine that the SBOM was invalid because there were duplicate components.
  3. Combine 1. and 2. above.

/cc @sam.white @fcatteau @gonzoyumo

Edited by Fabien Catteau