Skip to content

Match SBOM components to known advisories

Why are we doing this work

Components listed in a SBOM need to be matched to known advisories. This includes fetching all the advisories that match the package names and PURL types, and filtering out advisories such as the affected range excludes the version.

The output of the matching should be suitable to be used in two different contexts:

Matching a version to an affected range is implemented in Add service to match advisory affected ranges t... (#371995 - closed).

Vulnerability Scanning vs License Scanning

This is somewhat similar to LicenseScanning::PackageLicenses.

The contract is similar to the one of LicenseScanning::SbomScanner.

  • It takes an array of objects that respond to purl_type, name, and version.
  • It returns a similar array with an extra licenses field (array).

Non-functional requirements

  • Documentation:
  • Feature flag: No
  • Performance: check performance of the SQL query that fetches vulnerability advisories for a given set of packages
  • Testing: unit tests using rspec

Implementation plan

  • Add Gitlab::VulnerabilityScanning::PackageAdvisories class.
    • Input: Array of objects that respond to purl_type, name, and version.
      • Names include the namespace.
      • Names are normalized.
    • Fetch PackageMetadata::AffectedPackage models matching the purl_type and name.
      • Preload the advisory field to prevent N+1 queries.
    • Filter out advisories such as the affected range excludes the version.
    • Output: Array of objects with purl_type, name, version, and advisories.

The above plan was implemented in Draft: Add service to match SBOM components and... (!126954 - closed) • Adam Cohen • 16.7, however, we had to postpone that MR due to efficiency concerns.

The crux of the efficiency concern is that a consumer calling Gitlab::VulnerabilityScanning::PackageAdvisories#fetch will end up fetching all of the advisory data at once, with no way of iterating through this information, which could easily lead to a DB query timeout.

In order to solve this, we'll need to change Gitlab::VulnerabilityScanning::PackageAdvisories#fetch from the MR Draft: Add service to match SBOM components and... (!126954 - closed) • Adam Cohen • 16.7 to use each_batch, similar to how this was implemented in Sbom::PossiblyAffectedOccurrencesFinder#execute_in_batches. This will allow consumers of Gitlab::VulnerabilityScanning::PackageAdvisories#fetch to iterate through the result set in batches, thereby reducing the possibility of a DB timeout.

So to the developer that picks up this issue - please start by re-opening Draft: Add service to match SBOM components and... (!126954 - closed) • Adam Cohen • 16.7.

Verification steps

Verify that the performance of the query is acceptable when used in production. See Improve performance of package license query to... (#398679 - closed) and the documentation for example of optimizations that can further scope the query and make an efficient use of the IN operator.

Edited by Adam Cohen