Skip to content

Schedule new version of RemoveDuplicateVulnerabilitiesFindings

What does this MR do and why?

This MR schedules modified Gitlab::BackgroundMigration::RemoveDuplicateVulnerabilitiesFindings as per https://gitlab.com/gitlab-org/gitlab/-/issues/341917

Database review

Total migration runtime

Total records: 19_335_492
Batch size: 5_000
Total batches: 3_867
Interval: 2 minutes
3867 * 2 = 7734 minutes = 128 hours = 5 days

Records affected

We have ~26463 pairs of duplicates so ~26463 records will be dropped. Any of those Vulnerability::Finding objects can be assigned to a Vulnerability so in the worst case scenario we will be:

  • dropping 26463 Vulnerability::Finding records
  • dropping 26463 Vulnerability records

Based on:

SELECT DISTINCT report_type, location_fingerprint, primary_identifier_id, project_id, array_agg(id) as ids, array_agg(uuid) as uuids
FROM vulnerability_occurrences
GROUP BY report_type, location_fingerprint, primary_identifier_id, project_id
HAVING (COUNT(*) > 1) AND (array_length(array_agg(vulnerability_id) FILTER (WHERE vulnerability_id IS NOT NULL), 1) = 1);

there are 199 duplicate pairs where only one Vulnerabilities::Finding has a Vulnerability associated so the count of records dropped would be 26463 and 26264

How to set up and validate locally

  1. bundle exec spring rspec spec/migrations/20211018152654_schedule_remove_duplicate_vulnerabilities_findings3_spec.rb

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by Michał Zając

Merge request reports