Skip to content

Ingest source package name from Trivy SBOM component properties

Proposal

As discussed here, when looking up advisories for a package, trivy first uses the source package, if available, and falls back to the package name. For example, the package libperl5.38 has Source: perl listed in the dpkg manifest:

$ docker run -it --rm registry.gitlab.com/gitlab-org/security-products/analyzers/gemnasium/tmp/python:5db727fd3df8c65d8d85ed470ee79624d728217c bash

root@547bb47b6a06:/# grep -A 9 'Package: libperl5.38' /var/lib/dpkg/status
Package: libperl5.38
Status: install ok installed
Priority: optional
Section: libs
Installed-Size: 29325
Maintainer: Niko Tyni <ntyni@debian.org>
Architecture: amd64
Multi-Arch: same
Source: perl <--------------------------------------- SOURCE PACKAGE IS `perl`
Version: 5.38.0-2

As such, the trivy-db that we use for the source of advisories does not contain vulnerability information for the package libperl5.38 but instead contains advisory information for the source package perl.

When trivy scans an image, if a source package has a vulnerability, trivy considers all packages that have the same source package as being vulnerable.

For example, if perl <= 5.38.0-2 is vulnerable to a particular CVE, then the following packages are also vulnerable, because they all list perl as the source package:

+------------+--------------+---------------------------+-------------------+------------------------------------------------------------------------+
| Unapproved |    High      |        libperl5.38        |     5.38.0-2      | CPAN.pm before 2.35 does not verify TLS certificates when downloading  |
|            |              |                           |                   |                       distributions over HTTPS.                        |
+------------+--------------+---------------------------+-------------------+------------------------------------------------------------------------+
| Unapproved |    High      |           perl            |     5.38.0-2      | CPAN.pm before 2.35 does not verify TLS certificates when downloading  |
|            |              |                           |                   |                       distributions over HTTPS.                        |
+------------+--------------+---------------------------+-------------------+------------------------------------------------------------------------+
| Unapproved |    High      |         perl-base         |     5.38.0-2      | CPAN.pm before 2.35 does not verify TLS certificates when downloading  |
|            |              |                           |                   |                       distributions over HTTPS.                        |
+------------+--------------+---------------------------+-------------------+------------------------------------------------------------------------+
| Unapproved |    High      |     perl-modules-5.38     |     5.38.0-2      | CPAN.pm before 2.35 does not verify TLS certificates when downloading  |
|            |              |                           |                   |                       distributions over HTTPS.                        |
+------------+--------------+---------------------------+-------------------+------------------------------------------------------------------------+

(see this job for details)

When we ingest an SBOM for Container Scanning, we currently only store the following fields:

  • type
  • name
  • purl
  • version

For example, for the package libperl5.38, we have the following fields and values:

Field Value
type library
name libperl5.38
purl pkg:deb/debian/libperl5.38@5.38.0-2?distro=debian-12.1
version 5.38.0-2

This presents a problem, because as stated earlier, the trivy-db does not contain affected package information for libperl5.38, but instead for the source package perl, however, we currently have no way of correlating the libperl5.38 package to the source package perl from only the above details.

However, the source SBOM does contain this information in the properties field, we just don't currently ingest it.

For example, trivy produces an SBOM with the source package perl in the aquasecurity:trivy:SrcName property:

Click to expand trivy-produced SBOM
{
  "components": [
    {
      "bom-ref": "pkg:deb/debian/libperl5.38@5.38.0-2?distro=debian-12.1",
      "type": "library",
      "name": "libperl5.38",
      "version": "5.38.0-2",
      "purl": "pkg:deb/debian/libperl5.38@5.38.0-2?distro=debian-12.1",
      "properties": [
        {
          "name": "aquasecurity:trivy:SrcName",
          "value": "perl"
        }
      ]
    }

And, syft produces an SBOM with the source package perl in the syft:metadata:source property:

Click to expand syft-produced SBOM
{
  "components": [
    {
      "bom-ref": "pkg:deb/debian/libperl5.38@5.38.0-2?arch=amd64&upstream=perl&distro=debian-12&package-id=c2dfca7103136fcb",
      "type": "library",
      "publisher": "Niko Tyni <ntyni@debian.org>",
      "name": "libperl5.38",
      "version": "5.38.0-2",
      "cpe": "cpe:2.3:a:libperl5.38:libperl5.38:5.38.0-2:*:*:*:*:*:*:*",
      "purl": "pkg:deb/debian/libperl5.38@5.38.0-2?arch=amd64&upstream=perl&distro=debian-12",
      "properties": [
        {
          "name": "syft:metadata:source",
          "value": "perl"
        }
      ]
    }

In order to properly match packages such as libperl5.38 against advisories in the trivy-db for the source package perl, we need to update the SBOM ingestion code in the rails monolith to also store the source package from the component.properties for trivy-produced SBOMs only, which is the purpose of this issue.

Proposals

Previous implementation plan
  1. Add a new source_package_name field to Gitlab::Ci::Reports::Sbom::Component.

  2. Add a new source_package_name field to the Sbom::ComponentVersion model:

    1. Create a migration to add source_package_name to the sbom_component_versions table.

    2. Add a new index to the sbom_component_versions table:

      Note: previous implementation plan was about adding a field to sbom_components, please, see this thread

      Click to expand original index suggestion which doesn't work
      index_sbom_components_on_component_type_source_package_name_and_purl_type" UNIQUE, btree (source_package_name, purl_type, component_type)

      Note: there's a problem with this index due to the UNIQUE keyword, as explained here. Because of this, we'll need to remove the UNIQUE keyword, as shown in the revised index below.

      Revised index: (as discussed here):

      index_sbom_components_on_component_type_source_package_name_and_purl_type" btree (source_package_name, purl_type, component_type)
  3. Update Gitlab::Ci::Parsers::Sbom::Cyclonedx#parse_components to ingest the components[].properties[].aquasecurity:trivy:SrcName value and store it in sbom_components.source_package_name.

  4. Add unit tests

Implementation Plan

  1. Add a new sbom_source_packages table:

  2. Add a source_package_name method to the Sbom::SourceHelper module. It returns the value of data['SrcName'].

  3. Delegate the source_package_name method to the properties and allow nil (components may not have any properties).

    delegate :source_package_name, to: :properties, allow_nil: true
  4. Update the Sbom::Ingestion::OccurrenceMap method so that it includes a source_package_id accessor. Update the #to_h method so that it outputs source_package_id: source_package_id in the resulting hash. Delegate the #source_package_name to the :report_component.

  5. Add a new task to the Sbom::Ingestion::Tasks namespace. This task will include the Gitlab::Ingestion::BulkInsertableTask module.

    • Name the task IngestSourcePackageNames
    • Set self.model to Sbom::SourcePackage
    • Set self.uses to %i[name purl_type id].freeze. The :id will be used to set the source_package_id column, and the :name and :purl_type are used as a key to for the :id value in a @maps_grouped_by_uniq_attrs hash map.
    • Set self.unique_by to %i[name purl_type].freeze.
    • Add an #attributes method that returns a slice of hashes like so:
      occurrence_maps.filter(&:source_package_name).map do |occurrence_map|
        {
          name: occurrence_map.source_package_name,
          purl_type: occurrence_map.purl_type
        }
      end
    • Add an after_ingest method that sets the return id value as the source_package_id using the values from @maps_grouped_by_uniq_attrs. See Sbom::Ingestion::Tasks::IngestComponents for an example implementation.
  6. Update the IngestReportSliceService::TASKS array. Add the newly created IngestSourcePackageNames before the IngestOccurrences task.

  7. Update the Sbom::Ingestion::Tasks::IngestOccurrences attributes so that it includes source_package_id: occurrence_map.source_package_id in the hash output.

  8. Ensure that the related specs are updated. The following files in ee/spec/services/sbom/ingestion/ will be affected:

    • occurrence_map_spec.rb - test that the source_package_id is assigned in when ids are assigned and that it delegates the source_package_name correctly.
    • tasks/ingest_occurrences_spec.rb - ensure that the #attributes method sets the source_package_id attribute correctly when it's nil and when it's not nil.
    • tasks/ingest_source_packages_spec.rb - ensure that it is idempotent, unique by constraints are utilized, the correct attributes are used (nil source package names are removed), and that the expected attributes are set after ingest.
      • For example, you could verify that the perl and perl-base components both have the same source_package_id set because they both belong to the perl source package.

Validation testing

  1. Validate Update PossiblyAffectedOccurrencesFinder to wor... (#428681 - closed).
  2. Create a project with next content:

.gitlab-ci.yml

variables:
  CS_IMAGE: 'golang:1.20-alpine'

include:
  - template: Jobs/Container-Scanning.gitlab-ci.yml
  1. Run a pipeline and make sure that container_scanning:cyclonedx report is created

GDK

in Rails console run:

Sbom::ComponentVersion.where(component: Sbom::Componenent.find(name: 'alpine-baselayout-data'))

Check if the field source_package_name is equal alpine-baselayout.

GitLab.com

After deploy validate that there is no new errors logged and there is no regression in Group Dependency List.

/cc @gonzoyumo @smeadzinger @fcatteau

Auto-Summary 🤖

Discoto Usage

Points

Discussion points are declared by headings, list items, and single lines that start with the text (case-insensitive) point:. For example, the following are all valid points:

  • #### POINT: This is a point
  • * point: This is a point
  • + Point: This is a point
  • - pOINT: This is a point
  • point: This is a **point**

Note that any markdown used in the point text will also be propagated into the topic summaries.

Topics

Topics can be stand-alone and contained within an issuable (epic, issue, MR), or can be inline.

Inline topics are defined by creating a new thread (discussion) where the first line of the first comment is a heading that starts with (case-insensitive) topic:. For example, the following are all valid topics:

  • # Topic: Inline discussion topic 1
  • ## TOPIC: **{+A Green, bolded topic+}**
  • ### tOpIc: Another topic

Quick Actions

Action Description
/discuss sub-topic TITLE Create an issue for a sub-topic. Does not work in epics
/discuss link ISSUABLE-LINK Link an issuable as a child of this discussion

Last updated by this job

Discoto Settings
---
summary:
  max_items: -1
  sort_by: created
  sort_direction: ascending

See the settings schema for details.

Edited by Adam Cohen