Skip to content

Update occurrences rather than appending new ones

Brian Williams requested to merge bwill/update-sbom-occurrences-in-place into master

What does this MR do and why?

Describe in detail what your merge request does and why.

Related to: #373781 (closed)

The initial implementation of sbom_occurrences involved tracking all the dependencies in all of the pipelines. When we enabled the feature on production, the table grew far too quickly (almost 1 million rows per hour). In order to allow us to ship an MVC, we're going to take a more minimal approach. When new SBoM reports are ingested, we'll create new records, update the existing ones, and delete the no longer present ones. The sbom_occurrences table will only hold the current state of the default branch.

Database 💾

Aside from the migrations, there are two new queries introduced in ee/app/services/sbom/ingestion/post_ingestion_service.rb.

Finding the not_present_occurrences:

SELECT
    "sbom_occurrences".*
FROM
    "sbom_occurrences"
WHERE
    "sbom_occurrences"."project_id" = 2
    AND "sbom_occurrences"."id" IN (
        SELECT
            "sbom_occurrences"."id"
        FROM
            "sbom_occurrences"
        WHERE
            "sbom_occurrences"."project_id" = 2
            AND "sbom_occurrences"."id" NOT IN (
              ...
            )
    )

Then deleting them in batches.

Migrating up 🔼

These migrations delete data. Since the feature is currently behind a feature flag, it's easier for us to delete all of the data in the sbom_occurrences table and start anew than it is to attempt to migrate that data to the new schema. These deletions won't be visible to users. This will result in ~3 million rows being deleted from sbom_occurrences in production, and ~1000 rows in staging.

bin/rails db:migrate:main RAILS_ENV=test
main: == 20221209174132 RemoveSbomOccurrencesUniqueIndex: migrating =================
main: -- transaction_open?()
main:    -> 0.0000s
main: -- view_exists?(:postgres_partitions)
main:    -> 0.5687s
main: -- indexes(:sbom_occurrences)
main:    -> 0.0039s
main: -- execute("SET statement_timeout TO 0")
main:    -> 0.0004s
main: -- remove_index(:sbom_occurrences, {:algorithm=>:concurrently, :name=>"index_sbom_occurrences_on_ingestion_attributes"})
main:    -> 0.0032s
main: -- execute("RESET statement_timeout")
main:    -> 0.0005s
main: == 20221209174132 RemoveSbomOccurrencesUniqueIndex: migrated (0.5882s) ========

main: == 20221209174157 TruncateSbomOccurrences: migrating ==========================
main: -- execute("TRUNCATE sbom_occurrences")
main:    -> 0.0040s
main: == 20221209174157 TruncateSbomOccurrences: migrated (0.0043s) =================

main: == 20221212192452 AddUuidColumnToSbomOccurrences: migrating ===================
main: -- add_column(:sbom_occurrences, :uuid, :uuid, {:null=>false})
main:    -> 0.0018s
main: == 20221212192452 AddUuidColumnToSbomOccurrences: migrated (0.0021s) ==========

main: == 20221212192527 IndexSbomOccurrencesOnUuid: migrating =======================
main: -- transaction_open?()
main:    -> 0.0000s
main: -- view_exists?(:postgres_partitions)
main:    -> 0.0009s
main: -- index_exists?(:sbom_occurrences, :uuid, {:unique=>true, :name=>"index_sbom_occurrences_on_uuid", :algorithm=>:concurrently})
main:    -> 0.0024s
main: -- add_index(:sbom_occurrences, :uuid, {:unique=>true, :name=>"index_sbom_occurrences_on_uuid", :algorithm=>:concurrently})
main:    -> 0.0017s
main: == 20221212192527 IndexSbomOccurrencesOnUuid: migrated (0.0102s) ==============

Migrating down 🔽

bin/rails db:migrate:down:main RAILS_ENV=test VERSION=20221212192527
bin/rails db:migrate:down:main RAILS_ENV=test VERSION=20221212192452
bin/rails db:migrate:down:main RAILS_ENV=test VERSION=20221209174157
bin/rails db:migrate:down:main RAILS_ENV=test VERSION=20221209174132

main: == 20221212192527 IndexSbomOccurrencesOnUuid: reverting =======================
main: -- transaction_open?()
main:    -> 0.0000s
main: -- view_exists?(:postgres_partitions)
main:    -> 0.0009s
main: -- indexes(:sbom_occurrences)
main:    -> 0.0043s
main: -- execute("SET statement_timeout TO 0")
main:    -> 0.0004s
main: -- remove_index(:sbom_occurrences, {:algorithm=>:concurrently, :name=>"index_sbom_occurrences_on_uuid"})
main:    -> 0.0033s
main: -- execute("RESET statement_timeout")
main:    -> 0.0005s
main: == 20221212192527 IndexSbomOccurrencesOnUuid: reverted (0.0203s) ==============

main: == 20221212192452 AddUuidColumnToSbomOccurrences: reverting ===================
main: -- remove_column(:sbom_occurrences, :uuid, :uuid, {:null=>false})
main:    -> 0.0018s
main: == 20221212192452 AddUuidColumnToSbomOccurrences: reverted (0.0097s) ==========

main: == 20221209174157 TruncateSbomOccurrences: reverting ==========================
main: == 20221209174157 TruncateSbomOccurrences: reverted (0.0051s) =================

main: == 20221209174132 RemoveSbomOccurrencesUniqueIndex: reverting =================
main: -- transaction_open?()
main:    -> 0.0000s
main: -- view_exists?(:postgres_partitions)
main:    -> 0.5789s
main: -- index_exists?(:sbom_occurrences, [:project_id, :component_id, :component_version_id, :source_id, :commit_sha], {:unique=>true, :name=>"index_sbom_occurrences_on_ingestion_attributes", :algorithm=>:concurrently})
main:    -> 0.0048s
main: -- execute("SET statement_timeout TO 0")
main:    -> 0.0010s
main: -- add_index(:sbom_occurrences, [:project_id, :component_id, :component_version_id, :source_id, :commit_sha], {:unique=>true, :name=>"index_sbom_occurrences_on_ingestion_attributes", :algorithm=>:concurrently})
main:    -> 0.0064s
main: -- execute("RESET statement_timeout")
main:    -> 0.0009s
main: == 20221209174132 RemoveSbomOccurrencesUniqueIndex: reverted (0.6062s) ========

How to set up and validate locally

Numbered steps to set up and validate the change are strongly suggested.

  1. Ensure that you have an EE license

  2. Enable the cyclonedx_sbom_ingestion feature flag using the rails console: Feature.enable(:cyclonedx_sbom_ingestion)

  3. Setup gitlab runner

  4. Create a new project from a template, use the NodeJS/Express template.

  5. Create a .gitlab-ci.yml file with this configuration:

    include:
      - template: Security/Dependency-Scanning.gitlab-ci.yml
  6. Verify that the gemnasium-dependency_scanning outputs a gl-sbom-npm-npm.cdx.json artifact

  7. Connect to the DB with gdk psql. Run this query:

    select
      name, version, purl_type
    from
      sbom_components
    inner join sbom_component_versions
      on sbom_components.id = sbom_component_versions.component_id
    inner join sbom_occurrences
      on sbom_component_versions.id = sbom_occurrences.component_version_id
    where pipeline_id = YOUR_PIPELINE_ID;
  8. Remove dependencies from package.json / package-lock.json

  9. Run the query again; Removed dependencies should be deleted

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by Brian Williams

Merge request reports