Add SBoM ingestion service processing

Merged Brian Williams requested to merge bwill/add-sbom-ingestion-service into master

What does this MR do and why?

Describe in detail what your merge request does and why.

Issue: #364709 Epic: &8024

This MR adds a service for ingesting SBoM reports. The reports are JSON files which represent several objects. This service is responsible for taking pre-parsed representations of these reports, and persisting them into the database using bulk upserts. This MR adds the pre-processing to prepare these objects for insertion. The bulk insertions are implemented in !96575 (merged).

The following DDL diagram shows the relations and the order in which they are created:

sbom_ingestion_relations

All relations tie back to a single sbom_occurrence record, so an OccurenceMap data structure is used to hold all attributes which are related to each other during processing. The service takes the report data, turns it into OccurenceMaps, and then passes the OccurrenceMaps into the ingestion pipeline for performing bulk upserts for each model. The following diagram illustrates the flow of data:

flowchart TD
IngestReportsWorker[IngestReportsWorker: Executes IngestReportsService when pipelines complete];
IngestReportsService[IngestReportsService: Collects reports from pipeline];
IngestReportService[IngestReportService: Turns a single report into batches of OccurenceMaps];
IngestReportSliceService[IngestReportSliceService: Passes a batch of OccurenceMaps into the ingestion pipeline];

IngestReportsWorker-- pipeline -->IngestReportsService
IngestReportsService-- sbom_report -->IngestReportService
IngestReportService-- "occurrence_maps (batched)" -->IngestReportSliceService
IngestReportSliceService-- "occurence_maps (batched)" -->IngestComponents

subgraph Ingestion Pipeline
  IngestComponents-- component_ids -->IngestComponentVersions
  IngestComponentVersions-- component_version_ids -->IngestSources
  IngestSources-- source_ids -->IngestOccurrences
end

This MR implements these classes, up to and excluding the ingestion pipeline.

Screenshots or Screen Recordings

These are strongly recommended to assist reviewers and reduce the time to merge your change.

https://youtu.be/5a-_l1bqWhQ

How to set up and validate locally

Numbered steps to set up and validate the change are strongly suggested.

  1. Setup gitlab runner

  2. Create a new project

  3. Add the following .gitlab-ci.yml to the project:

    persist_sbom:
      image: alpine:latest
      script:
        - wget https://gitlab.com/-/snippets/2378046/raw/main/gl-sbom-npm-npm.cdx.json
        - wget https://gitlab.com/-/snippets/2378046/raw/main/gl-sbom-go-go.cdx.json
      artifacts:
        reports:
          cyclonedx:
            - gl-sbom-npm-npm.cdx.json
            - gl-sbom-go-go.cdx.json
  4. The pipeline should run and succeed. Note down the pipeline ID.

  5. Make this change:

    diff --git a/ee/app/services/sbom/ingestion/tasks/ingest_components.rb b/ee/app/services/sbom/ingestion/tasks/ingest_components.rb
    index f3ee5025553..c975f344aa1 100644
    --- a/ee/app/services/sbom/ingestion/tasks/ingest_components.rb
    +++ b/ee/app/services/sbom/ingestion/tasks/ingest_components.rb
    @@ -5,7 +5,11 @@ module Ingestion
        module Tasks
          class IngestComponents < Base
            def self.execute(pipeline, occurrence_maps)
    -          # Not yet implemented
    +          f = File.open(Rails.root.join('output.txt'), 'a')
    +          f.puts "Got occurrence maps"
    +          f.puts "Size: #{occurrence_maps.size}"
    +          PP.pp(occurrence_maps, f)
    +          f.close
            end
          end
        end
  6. Start the rails console: bundle exec rails c

  7. Invoke the service:

    pipeline = Pipeline.find(pipeline_id)
    ::Sbom::Ingestion::IngestReportsService.execute(pipeline)
  8. Look in output.txt to see what got passed to the ingestion pipeline:

    $ head -n 30 output.txt
    Got occurrence maps
    Size: 15
    [#<Sbom::Ingestion::OccurrenceMap:0x000000013ac7aca0
      @report_component=
      #<Gitlab::Ci::Reports::Sbom::Component:0x000000010e1741c0
        @component_type="library",
        @name="github.com/astaxie/beego",
        @version="v1.10.0">,
      @report_source=
      #<Gitlab::Ci::Reports::Sbom::Source:0x000000010e17f818
        @data=
        {"input_file"=>{"path"=>"go.sum"},
          "source_file"=>{"path"=>"go.mod"},
          "package_manager"=>{"name"=>"go"},
          "language"=>{"name"=>"go"}},
        @fingerprint=
        "78f0613de674dc2d37f07d8662969754f46abbdfe7efd88fcc6cbe8d37df9058",
        @source_type=:dependency_scanning>>,
    #<Sbom::Ingestion::OccurrenceMap:0x000000013ac7ac50
      @report_component=
      #<Gitlab::Ci::Reports::Sbom::Component:0x000000010e174008
        @component_type="library",
        @name="github.com/davecgh/go-spew",
        @version="v1.1.1">,
      @report_source=
      #<Gitlab::Ci::Reports::Sbom::Source:0x000000010e17f818
        @data=
        {"input_file"=>{"path"=>"go.sum"},
          "source_file"=>{"path"=>"go.mod"},
          "package_manager"=>{"name"=>"go"},

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by Brian Williams