Generate new in-memory Dependency Scanning report from advisories affecting SBOM components
Why are we doing this work
In order to begin continuously scanning components, will need a scanner class that does the following:
- Generates an empty dependency scanning report with the appropriate scanner and scan data.
- Fetches advisories using finder/service that will be implemented in Match SBOM components to known advisories (#371055 - closed).
- Normalizes advisory data and uses the data to add a finding in the dependency scanning report.
Specifically, this issue focuses on points 1 and 3 to maximize the work that can be done in parallel.
Relevant links
- Match SBOM components to known advisories (#371055 - closed)
- Draft: Add proof of concept SBOM scan class (!116739 - closed) (proof of concept)
Non-functional requirements
-
Documentation: -
Feature flag: No -
Performance: -
Testing: Specs should be added for the functionality added.
Implementation plan
Report builder classes
-
Create new report builder classes e.g. SecurityReportBuilder
,ContainerScanningReportBuilder
, andDependencyScanningReportBuilder
. These classes will be responsible for building the report programmatically for reach report type. The last two will extend from theSecurityReportBuilder
base class. -
The report builders should establish a method for adding components and their advisories. The method add_component_advisories
that takes aGitlab::Ci::Reports::Sbom::Component
andArray<PackageMetadata::Advisory>
as arguments is proposed. -
The report builders might require the following private methods: -
#build_scanner_data
: calls::Gitlab::Ci::Reports::Security::Scanner.new
to create a standard scanner class. See this snippet for an example on how to do this. Must be added to the report using#add_scanner
. -
#build_scan
: callsSecurity::Scan.new
to create a new scan object. See this snippet for an example of how to do this. Must be added to the constructor when creating a new instance of::Gitlab::Ci::Reports::Security::Finding
. -
#build_finding_data
: creates a hash that holds all necessary data that surfaces to a user. See this snippet for an example of how to do this. -
#build_identifiers
: creates identifiers from the advisory. See this snippet for an example of how to do this. Historically, the fields would be generated using the functionality provided by thegitlab.com/gitlab-org/security-products/analyzers/report/v3
library, so this is also a recommended reference. -
#build_links
: See this snippet for an example of how to do this. The input is an array of strings in the example provided. -
#build_location
: See this snippet for an example of how to do this. -
#build_severity
- see snippet -
~~ #build_confidence
- see ~~snippetdeprecated: Use 'unknown' consistently for now. -
#build_uuid
- see snippet on how this is done for normal security reports
-
Scanner
-
Create a new scanner class e.g. ee/lib/gitlab/vulnerability_scanning/sbom_scanner.rb
. -
Implement initializer. -
The scanner class is initialized with an instance of a build and a SBOM report. -
On initialization, the SBoM is validated. Do we have enough data to perform the scan? Is the SBoM valid?
-
-
Implement #report
method.-
Build a report using the report builder. -
Convert SBOM components to objects that respond to purl_type
,name
, andversion
.-
name
includes the PURL namespace, and it's normalized. - We might refactor and implement this in
Sbom::Component
instead of repeating the code we already have inLicenseScanning::PipelineComponents
.
-
-
Get advisories for components using the PackageAdvisories
class introduced in #371055 (closed). -
For each advisory of each affected component, add findings using #add_component_advisories
.
-
Dependency Scanning vs Container Scanning
The following are the differences between the dependency scanning and container scanning classes that highlight were we should leverage the base class logic and where we should override.
Vulnerabilities - represented by Gitlab::Ci::Reports::Security::Finding
:
- Dependency Scanning includes
details
,cve
andname
inside the vulnerability object. Thecve
field is always empty.
Scan - represented by Gitlab::Ci::Reports::Security::Scanner
:
- The
scan.analyzer.url
field is exclusive to dependency scanning (although I think it would make sense to add it to container scanning as well).
# Dependency Scanning - vendor is `GitLab`
╭─────────┬─────────────────────────────────────────────────────────────────────╮
│ id │ gemnasium │
│ name │ Gemnasium │
│ url │ https://gitlab.com/gitlab-org/security-products/analyzers/gemnasium │
│ vendor │ {record 1 field} │
│ version │ 4.0.3 │
╰─────────┴─────────────────────────────────────────────────────────────────────╯
# Container Scanning - vendor is `GitLab`
╭─────────┬───────────────────────────╮
│ id │ gcs │
│ name │ GitLab Container Scanning │
│ vendor │ {record 1 field} │
│ version │ 6.1.1 │
╰─────────┴───────────────────────────╯
Remediations:
- The
remediations
field is exclusive to container scanning.
Dependency Files:
- The
dependency_files
field is exclusive to container scanning.
Method Analysis:
Method | Shared? | Purpose |
---|---|---|
build_security_report |
Build the security report. The report builder can hold a report type and use it here for reuse. | |
build_scanner |
Builds scanner that originated the SBoM. One of Gitlab Container Scanning or Gemnasium . |
|
build_uuid |
Used to deduplicate findings. | |
build_location |
Location of the source for the vulnerability. Exclusive to dependency scanning. | |
build_details |
Details of vulnerable package. Exclusive to dependency scanning. | |
build_links + build_link
|
Links for all related advisories. | |
build_original_data |
A hash that contains a representation of the vulnerability JSON data. | |
build_finding_name |
The title/name of the vulnerability. | |
build_identifiers + build_identifiers
|
Identifiers are the advisories related to the finding. | |
build_findings + build_finding
|
Creates a finding from the reported advisory. |
Other proposals considered
The following proposals were also considered when selecting an implementation plan
Generate the report as JSON and re-use parser
This proposal requires us to convert the components and vulnerabilities to JSON objects inside of a security report. The JSON report is then parsed by the Gitlab::Ci::Parsers::Security::Common.parse!
method.
Pros
- Schema validation
- Easier to understand - the JSON objects produced are human-readable.
- Easier to test - we can take a SBoM and the specific version of the GLAD used for a DS job as inputs. If our conversion is a pure function, the output should match the DS report that was generated by gemnasium alongside the SBoM.
- We can re-create the
gl-dependeny-scanning-report.json
, and in the future thegl-container-scanning-report.json
artifacts for users to download if required. - Code reuse
Cons
- Performance might suffer if we're marshalling JSON objects only for this to be undone.
Verification steps
None.