Schedule new version of RemoveDuplicateVulnerabilitiesFindings
What does this MR do and why?
This MR schedules modified Gitlab::BackgroundMigration::RemoveDuplicateVulnerabilitiesFindings
as per https://gitlab.com/gitlab-org/gitlab/-/issues/341917
Database review
Total migration runtime
Total records: 19_335_492
Batch size: 5_000
Total batches: 3_867
Interval: 2 minutes
3867 * 2 = 7734 minutes = 128 hours = 5 days
Records affected
We have ~26463 pairs of duplicates so ~26463 records will be dropped. Any of those Vulnerability::Finding
objects can be assigned to a Vulnerability
so in the worst case scenario we will be:
- dropping 26463
Vulnerability::Finding
records - dropping 26463
Vulnerability
records
Based on:
SELECT DISTINCT report_type, location_fingerprint, primary_identifier_id, project_id, array_agg(id) as ids, array_agg(uuid) as uuids
FROM vulnerability_occurrences
GROUP BY report_type, location_fingerprint, primary_identifier_id, project_id
HAVING (COUNT(*) > 1) AND (array_length(array_agg(vulnerability_id) FILTER (WHERE vulnerability_id IS NOT NULL), 1) = 1);
there are 199 duplicate pairs where only one Vulnerabilities::Finding
has a Vulnerability
associated so the count of records dropped would be 26463 and 26264
How to set up and validate locally
bundle exec spring rspec spec/migrations/20211018152654_schedule_remove_duplicate_vulnerabilities_findings3_spec.rb
MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
I have evaluated the MR acceptance checklist for this MR.
Edited by Michał Zając