Skip to content

fix: Add background migration for recalculating finding signatures

What does this MR do and why?

Adds background migration for repopulating all vulnerability_finding_signatures.

Due to a previous logic error in how signatures were populated (#343037 (comment 709539133)), we have some malformed associations between signatures and findings.

This background migration repopulates the signatures table with raw metadata stored within findings.

Only SAST reports have signatures currently, so we can add a scope to narrow the total result set somewhat.

Relates to #343037 (closed)

Background Migration Details:

In batches of 1000, projects will be scheduled at 2 minute intervals. For each project, batches of 1000 vulnerability_findings will be fetched and iterated over to delete and insert signatures. There will only be one signature per finding currently so I didn't bother including the delete EXPLAIN but happy to if desired.

925,342 vulnerability_finding_signatures to delete and re-insert (#343037 (comment 711939484))
batch size = 1000
925,342 / 1000 = 926 batches

2 mins delay per signature batch (safe for the given total time per batch)

926 batches * 2 min per signature batch = 1852 mins to run all the scheduled jobs

We could likely speed this up beyond 2 minute batches but I matched previous similar migrations.

vulnerability_finding_signatures DELETE

All findings with signatures have only 1 current signature

EXPLAIN proof
Aggregate  (cost=0.42..36290.39 rows=308447 width=16) (actual time=1907.280..1907.282 rows=0 loops=1)
   Group Key: vulnerability_finding_signatures.finding_id
   Filter: (count(vulnerability_finding_signatures.finding_id) > 1)
   Rows Removed by Filter: 926335
   Buffers: shared hit=2101 read=2701 dirtied=103
   I/O Timings: read=1281.614 write=0.000
   ->  Index Only Scan using in
[...SKIP...]

vulnerability_finding_signatures SELECT batches

SQL
SELECT "vulnerability_finding_signatures".* FROM "vulnerability_finding_signatures" INNER JOIN "vulnerability_occurrences" ON "vulnerability_occurrences"."id" = "vulnerability_finding_signatures"."finding_id" WHERE "vulnerability_finding_signatures"."id" BETWEEN 4 AND 5 LIMIT 1000
EXPLAIN
 Limit  (cost=0.99..9.14 rows=2 width=55) (actual time=4.654..4.690 rows=2 loops=1)
   Buffers: shared hit=11 read=6
   I/O Timings: read=4.615 write=0.000
   ->  Nested Loop  (cost=0.99..9.14 rows=2 width=55) (actual time=4.653..4.688 rows=2 loops=1)
         Buffers: shared hit=11 read=6
         I/O Timings: read=4.615 write=0.000
         ->  Index Scan using vulnerability_finding_signatures_pkey on public.vulnerability_finding_signatures  (cost=0.42..3.48 rows=2 width=55) (actual time=2.373..2.376 rows=2 loops=1)
               Index Cond: ((vulnerability_finding_signatures.id >= 4) AND (vulnerability_finding_signatures.id <= 5))
               Buffers: shared hit=6 read=1
               I/O Timings: read=2.342 write=0.000
         ->  Index Only Scan using vulnerability_occurrences_pkey on public.vulnerability_occurrences  (cost=0.56..2.83 rows=1 width=8) (actual time=1.152..1.152 rows=1 loops=2)
               Index Cond: (vulnerability_occurrences.id = vulnerability_finding_signatures.finding_id)
               Heap Fetches: 0
               Buffers: shared hit=5 read=5
               I/O Timings: read=2.273 write=0.000

How to set up and validate locally

Steps for ensuring repopulation matches previous signature counts

  1. Check Vulnerabilities::FindingSignature.count
  2. bin/rails db:migrate
  3. wait 120 seconds (if greater than 1000 local findings)
  4. Compare to previous Vulnerabilities::FindingSignature.count

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by Lucas Charles

Merge request reports