Decouple `Sbom::Ingestion::Tasks::IngestOccurrencesVulnerabilities` and the `vulnerability_finding_pipeline` table

Background Context

As part of the epic to delete the vulnerability_finding_pipelines table, we need to migrate any application code using that table to a new query

This issue

The Sbom::Ingestion::Tasks::IngestOccurrencesVulnerabilities task ends up using this table when it calls occurrence_map.vulnerability_ids. That method returns an array of vulnerability IDs, that are provided by this query.

The Sbom::OccurrencesVulnerability being populated in this task is ultimately only used in this API call.

Implementation Plan

We considered just updating the API call to directly query vulnerability_occurrences, similar to what we did in MR 162713.

However, that change will take longer and is a bit tangential to the task at hand (dropping the vulnerability_finding_pipelines table).

So, we instead will add an id aggregation in the occurrence ingestion task:

@@ -124,6 +130,7 @@ def build_vulnerabilities_info
              occurrence_maps.name,
              occurrence_maps.version,
              occurrence_maps.path,
+             array_agg(vulnerability_occurrences.vulnerability_id) as vulnerability_ids,
              MAX(vulnerability_occurrences.severity) as highest_severity,
              COUNT(vulnerability_occurrences.id) as vulnerability_count
           SQL

That will end up giving us string values that look like this in the query result:

'{1,2,3,4,5}'

We can parse that back out to ruby integers via:

'{1,2,3,4,5}'
  .gsub(/[{}]/, '')
  .split(',')
  .map(&:to_i)

NOTE: This assumes we will not have NULL values or non-integer values in the result.

We can then modify the occurrence_map class to make vulnerability_ids a full attr_accesor (as opposed to the current attr_reader) and then pass these parsed values in:

@@ -72,6 +78,8 @@ def attributes
               project
             )

+            occurrence_map.vulnerability_ids = vulnerability_data.vulnerability_ids
+
             new_attributes = {
               project_id: project.id,
               pipeline_id: pipeline.id,

The downstream¹ Sbom::Ingestion::Tasks::IngestOccurrencesVulnerabilities task will then work properly, regardless of the state of the deprecate_vulnerability_occurrence_pipelines feature flag.

/cc @bwill @nmccorrison

By "downstream" here I mean its code gets executed after the task where we will now be populating the data. See the task order in Sbom::ingestion::IngestReportSliceService. This increase in inter-task coupling and order dependence is part of what makes this solution a bit janky ↩

Edited Aug 26, 2024 by Michael Becker