Generate new in-memory Dependency Scanning report from advisories affecting SBOM components

Why are we doing this work

In order to begin continuously scanning components, will need a scanner class that does the following:

Generates an empty dependency scanning report with the appropriate scanner and scan data.
Fetches advisories using finder/service that will be implemented in Match SBOM components to known advisories (#371055 - closed).
Normalizes advisory data and uses the data to add a finding in the dependency scanning report.

Specifically, this issue focuses on points 1 and 3 to maximize the work that can be done in parallel.

Relevant links

Non-functional requirements

Documentation:
Feature flag: No
Performance:
Testing: Specs should be added for the functionality added.

Implementation plan

Report builder classes

Scanner

Create a new scanner class e.g. ee/lib/gitlab/vulnerability_scanning/sbom_scanner.rb.
Implement initializer.
- The scanner class is initialized with an instance of a build and a SBOM report.
- On initialization, the SBoM is validated. Do we have enough data to perform the scan? Is the SBoM valid?
Implement #report method.
- Build a report using the report builder.
- Convert SBOM components to objects that respond to purl_type, name, and version.
  - name includes the PURL namespace, and it's normalized.
  - We might refactor and implement this in Sbom::Component instead of repeating the code we already have in LicenseScanning::PipelineComponents.
- Get advisories for components using the PackageAdvisories class introduced in #371055 (closed).
- For each advisory of each affected component, add findings using #add_component_advisories.

Dependency Scanning vs Container Scanning

The following are the differences between the dependency scanning and container scanning classes that highlight were we should leverage the base class logic and where we should override.

Vulnerabilities - represented by Gitlab::Ci::Reports::Security::Finding:

Dependency Scanning includes details, cve and name inside the vulnerability object. The cve field is always empty.

Scan - represented by Gitlab::Ci::Reports::Security::Scanner:

The scan.analyzer.url field is exclusive to dependency scanning (although I think it would make sense to add it to container scanning as well).

# Dependency Scanning - vendor is `GitLab`
╭─────────┬─────────────────────────────────────────────────────────────────────╮
│ id      │ gemnasium                                                           │
│ name    │ Gemnasium                                                           │
│ url     │ https://gitlab.com/gitlab-org/security-products/analyzers/gemnasium │
│ vendor  │ {record 1 field}                                                    │
│ version │ 4.0.3                                                               │
╰─────────┴─────────────────────────────────────────────────────────────────────╯

# Container Scanning - vendor is `GitLab`
╭─────────┬───────────────────────────╮
│ id      │ gcs                       │
│ name    │ GitLab Container Scanning │
│ vendor  │ {record 1 field}          │
│ version │ 6.1.1                     │
╰─────────┴───────────────────────────╯

Remediations:

The remediations field is exclusive to container scanning.

Dependency Files:

The dependency_files field is exclusive to container scanning.

Method Analysis:

Method	Shared?	Purpose
`build_security_report`	✅	Build the security report. The report builder can hold a report type and use it here for reuse.
`build_scanner`	✅	Builds scanner that originated the SBoM. One of `Gitlab Container Scanning` or `Gemnasium`.
`build_uuid`	✅	Used to deduplicate findings.
`build_location`	❌	Location of the source for the vulnerability. Exclusive to dependency scanning.
`build_details`	❌	Details of vulnerable package. Exclusive to dependency scanning.
`build_links` + `build_link`	✅	Links for all related advisories.
`build_original_data`	❌	A hash that contains a representation of the vulnerability JSON data.
`build_finding_name`	✅	The title/name of the vulnerability.
`build_identifiers` + `build_identifiers`	✅	Identifiers are the advisories related to the finding.
`build_findings` + `build_finding`	✅	Creates a finding from the reported advisory.

Other proposals considered

The following proposals were also considered when selecting an implementation plan

Generate the report as JSON and re-use parser

This proposal requires us to convert the components and vulnerabilities to JSON objects inside of a security report. The JSON report is then parsed by the Gitlab::Ci::Parsers::Security::Common.parse! method.

Pros

Schema validation
Easier to understand - the JSON objects produced are human-readable.
Easier to test - we can take a SBoM and the specific version of the GLAD used for a DS job as inputs. If our conversion is a pure function, the output should match the DS report that was generated by gemnasium alongside the SBoM.
We can re-create the gl-dependeny-scanning-report.json, and in the future the gl-container-scanning-report.json artifacts for users to download if required.
Code reuse

Cons

Performance might suffer if we're marshalling JSON objects only for this to be undone.

Verification steps

None.

Edited Jul 26, 2023 by Fabien Catteau