Add scanner_reported_severity column to security_findings

Summary

Eliminates the expensive LEFT JOIN in FindingsFinder by adding a scanner_reported_severity column to security_findings and updating the ingestion logic to store effective severity directly.

What does this MR do?

Phase 1: Schema Change

  • Adds scanner_reported_severity smallint column to store the original scanner-reported severity
  • Creates feature flag security_findings_effective_severity (gitlab_com_derisk)

Phase 2: Write Path (Ingestion)

  • Updates StoreFindingsService#store_finding_batch to lookup existing severity overrides
  • Stores both scanner_reported_severity (original) and severity (effective with overrides applied)

Phase 3: Read Path (FindingsFinder)

  • When feature flag is enabled, filters directly on security_findings.severity without LEFT JOIN
  • Removes the COALESCE expression that required joining to vulnerability_occurrences

Changes

File Description
db/migrate/... Add scanner_reported_severity smallint column
config/feature_flags/... Feature flag security_findings_effective_severity
ee/app/services/security/store_findings_service.rb Lookup overrides during ingestion
ee/app/finders/security/findings_finder.rb Remove LEFT JOIN when flag enabled
ee/spec/... Tests for both changes

Query Plan Comparison

Test Setup

Using scan_id = 384406048 (gitlab-org/gitlab, partition 235, 45 findings with severities 6-7).

Before (with LEFT JOIN)

SELECT security_findings.*
FROM security_findings
LEFT OUTER JOIN vulnerability_occurrences ON vulnerability_occurrences.uuid = security_findings.uuid
WHERE security_findings.scan_id = 384406048
  AND security_findings.partition_number = 235
  AND security_findings.deduplicated = TRUE
  AND COALESCE(vulnerability_occurrences.severity, security_findings.severity) IN (5,6,7)
ORDER BY security_findings.severity DESC, security_findings.id ASC
LIMIT 20;

postgres.ai plan link

 Limit  (cost=2763.03..2763.06 rows=9 width=1110) (actual time=0.328..0.331 rows=20 loops=1)
   ->  Sort  (cost=2763.03..2763.06 rows=9 width=1110) (actual time=0.327..0.329 rows=20 loops=1)
         Sort Key: security_findings.severity DESC, security_findings.id
         Sort Method: top-N heapsort  Memory: 49kB
         ->  Nested Loop Left Join  (cost=1.01..2762.89 rows=9 width=1110) (actual time=0.024..0.298 rows=45 loops=1)
               Filter: (COALESCE(vulnerability_occurrences.severity, security_findings.severity) = ANY ('{5,6,7}'::integer[]))
               ->  Index Scan using security_findings_235_scan_id_deduplicated_idx on security_findings_235 security_findings  (cost=0.44..609.34 rows=598 width=1110) (actual time=0.014..0.034 rows=45 loops=1)
                     Index Cond: ((scan_id = 384406048) AND (deduplicated = true))
                     Filter: (partition_number = 235)
               ->  Index Scan using index_vuln_findings_on_uuid_including_vuln_id_1 on vulnerability_occurrences  (cost=0.57..3.59 rows=1 width=18) (actual time=0.005..0.005 rows=1 loops=45)
                     Index Cond: (uuid = security_findings.uuid)
 Planning Time: 0.419 ms
 Execution Time: 0.363 ms
  • Cost: 2763
  • Execution: 0.363 ms
  • Requires Nested Loop Left Join to vulnerability_occurrences (45 index lookups)

After (without LEFT JOIN)

SELECT security_findings.*
FROM security_findings
WHERE security_findings.scan_id = 384406048
  AND security_findings.partition_number = 235
  AND security_findings.deduplicated = TRUE
  AND security_findings.severity IN (5,6,7)
ORDER BY security_findings.severity DESC, security_findings.id ASC
LIMIT 20;

postgres.ai plan link

 Limit  (cost=621.03..621.08 rows=20 width=1110) (actual time=0.062..0.064 rows=20 loops=1)
   ->  Sort  (cost=621.03..621.92 rows=355 width=1110) (actual time=0.061..0.062 rows=20 loops=1)
         Sort Key: security_findings.severity DESC, security_findings.id
         Sort Method: top-N heapsort  Memory: 49kB
         ->  Index Scan using security_findings_235_scan_id_deduplicated_idx on security_findings_235 security_findings  (cost=0.44..611.59 rows=355 width=1110) (actual time=0.020..0.041 rows=45 loops=1)
               Index Cond: ((scan_id = 384406048) AND (deduplicated = true))
               Filter: ((partition_number = 235) AND (severity = ANY ('{5,6,7}'::integer[])))
 Planning Time: 0.203 ms
 Execution Time: 0.087 ms
  • Cost: 621 (~4.5x reduction)
  • Execution: 0.087 ms (~4x faster)
  • Simple index scan, no join to vulnerability_occurrences

Summary

Metric Before (JOIN) After (no JOIN) Improvement
Cost 2763 621 4.5x lower
Execution time 0.363 ms 0.087 ms 4x faster
Join operations 45 index lookups 0 Eliminated

Note: The postgres.ai explain commands returned 0 rows due to clone timing, but EXPLAIN ANALYZE executed directly on the clone returned the expected 45 rows and 20-row results shown above.

Why a large table exception?

The security_findings table is large but:

  • This is a smallint (2 bytes) column with no default
  • The table has 30-90 day retention, so old data cycles out naturally
  • The performance benefit of eliminating the LEFT JOIN significantly outweighs the storage cost

Key Design Decisions

  1. No backfill needed - 30-90 day retention means old data cycles out naturally
  2. Eventual consistency - Next pipeline run picks up any new overrides
  3. Scanner severity always stored - Even when flag is off, scanner_reported_severity is populated for future use

MR acceptance checklist

Please evaluate this MR against the MR acceptance checklist.

Closes #590349

Related: https://gitlab.com/gitlab-com/request-for-help/-/issues/4212

Edited by Gregory Havenga

Merge request reports

Loading