Add scanner_reported_severity column to security_findings (!223880) · Merge requests · GitLab.org / GitLab

Summary

Eliminates the expensive LEFT JOIN in FindingsFinder by adding a scanner_reported_severity column to security_findings and updating the ingestion logic to store effective severity directly.

What does this MR do?

Phase 1: Schema Change

Adds scanner_reported_severity smallint column to store the original scanner-reported severity
Creates feature flag security_findings_effective_severity (gitlab_com_derisk)

Phase 2: Write Path (Ingestion)

Updates StoreFindingsService#store_finding_batch to lookup existing severity overrides
Stores both scanner_reported_severity (original) and severity (effective with overrides applied)

Phase 3: Read Path (FindingsFinder)

When feature flag is enabled, filters directly on security_findings.severity without LEFT JOIN
Removes the COALESCE expression that required joining to vulnerability_occurrences

Changes

File	Description
`db/migrate/...`	Add `scanner_reported_severity smallint` column
`config/feature_flags/...`	Feature flag `security_findings_effective_severity`
`ee/app/services/security/store_findings_service.rb`	Lookup overrides during ingestion
`ee/app/finders/security/findings_finder.rb`	Remove LEFT JOIN when flag enabled
`ee/spec/...`	Tests for both changes

Query Plan Comparison

Test Setup

Using scan_id = 384406048 (gitlab-org/gitlab, partition 235, 45 findings with severities 6-7).

Before (with LEFT JOIN)

SELECT security_findings.*
FROM security_findings
LEFT OUTER JOIN vulnerability_occurrences ON vulnerability_occurrences.uuid = security_findings.uuid
WHERE security_findings.scan_id = 384406048
  AND security_findings.partition_number = 235
  AND security_findings.deduplicated = TRUE
  AND COALESCE(vulnerability_occurrences.severity, security_findings.severity) IN (5,6,7)
ORDER BY security_findings.severity DESC, security_findings.id ASC
LIMIT 20;

postgres.ai plan link

 Limit  (cost=2763.03..2763.06 rows=9 width=1110) (actual time=0.328..0.331 rows=20 loops=1)
   ->  Sort  (cost=2763.03..2763.06 rows=9 width=1110) (actual time=0.327..0.329 rows=20 loops=1)
         Sort Key: security_findings.severity DESC, security_findings.id
         Sort Method: top-N heapsort  Memory: 49kB
         ->  Nested Loop Left Join  (cost=1.01..2762.89 rows=9 width=1110) (actual time=0.024..0.298 rows=45 loops=1)
               Filter: (COALESCE(vulnerability_occurrences.severity, security_findings.severity) = ANY ('{5,6,7}'::integer[]))
               ->  Index Scan using security_findings_235_scan_id_deduplicated_idx on security_findings_235 security_findings  (cost=0.44..609.34 rows=598 width=1110) (actual time=0.014..0.034 rows=45 loops=1)
                     Index Cond: ((scan_id = 384406048) AND (deduplicated = true))
                     Filter: (partition_number = 235)
               ->  Index Scan using index_vuln_findings_on_uuid_including_vuln_id_1 on vulnerability_occurrences  (cost=0.57..3.59 rows=1 width=18) (actual time=0.005..0.005 rows=1 loops=45)
                     Index Cond: (uuid = security_findings.uuid)
 Planning Time: 0.419 ms
 Execution Time: 0.363 ms

Cost: 2763
Execution: 0.363 ms
Requires Nested Loop Left Join to vulnerability_occurrences (45 index lookups)

After (without LEFT JOIN)

SELECT security_findings.*
FROM security_findings
WHERE security_findings.scan_id = 384406048
  AND security_findings.partition_number = 235
  AND security_findings.deduplicated = TRUE
  AND security_findings.severity IN (5,6,7)
ORDER BY security_findings.severity DESC, security_findings.id ASC
LIMIT 20;

postgres.ai plan link

 Limit  (cost=621.03..621.08 rows=20 width=1110) (actual time=0.062..0.064 rows=20 loops=1)
   ->  Sort  (cost=621.03..621.92 rows=355 width=1110) (actual time=0.061..0.062 rows=20 loops=1)
         Sort Key: security_findings.severity DESC, security_findings.id
         Sort Method: top-N heapsort  Memory: 49kB
         ->  Index Scan using security_findings_235_scan_id_deduplicated_idx on security_findings_235 security_findings  (cost=0.44..611.59 rows=355 width=1110) (actual time=0.020..0.041 rows=45 loops=1)
               Index Cond: ((scan_id = 384406048) AND (deduplicated = true))
               Filter: ((partition_number = 235) AND (severity = ANY ('{5,6,7}'::integer[])))
 Planning Time: 0.203 ms
 Execution Time: 0.087 ms

Cost: 621 (~4.5x reduction)
Execution: 0.087 ms (~4x faster)
Simple index scan, no join to vulnerability_occurrences

Summary

Metric	Before (JOIN)	After (no JOIN)	Improvement
Cost	2763	621	4.5x lower
Execution time	0.363 ms	0.087 ms	4x faster
Join operations	45 index lookups	0	Eliminated

Note: The postgres.ai explain commands returned 0 rows due to clone timing, but EXPLAIN ANALYZE executed directly on the clone returned the expected 45 rows and 20-row results shown above.

Why a large table exception?

The security_findings table is large but:

This is a smallint (2 bytes) column with no default
The table has 30-90 day retention, so old data cycles out naturally
The performance benefit of eliminating the LEFT JOIN significantly outweighs the storage cost

Key Design Decisions

No backfill needed - 30-90 day retention means old data cycles out naturally
Eventual consistency - Next pipeline run picks up any new overrides
Scanner severity always stored - Even when flag is off, scanner_reported_severity is populated for future use

MR acceptance checklist

Please evaluate this MR against the MR acceptance checklist.

Closes #590349

Edited Feb 18, 2026 by Gregory Havenga

Add scanner_reported_severity column to security_findings

Summary

What does this MR do?

Phase 1: Schema Change

Phase 2: Write Path (Ingestion)

Phase 3: Read Path (FindingsFinder)

Changes

Query Plan Comparison

Test Setup

Before (with LEFT JOIN)

After (without LEFT JOIN)

Summary

Why a large table exception?

Key Design Decisions

MR acceptance checklist

Related issues

Merge request reports