Skip to content

Schedule removal of duplicate Findings

What does this MR do?

This migration reschedules RemoveDuplicateVulnerabilitiesFindings background migration since we still have duplicate entires (at least on gprd-db-archive) according to the query below

Related to #292239 (closed)

Find duplicates query
SELECT DISTINCT report_type, location_fingerprint, primary_identifier_id, project_id, array_agg(id) as ids, array_agg(uuid) as uuids
FROM vulnerability_occurrences
GROUP BY report_type, location_fingerprint, primary_identifier_id, project_id
HAVING (COUNT(*) > 1);

Query timings are available over at !49937 (merged)

Updates query timings

Cold Query for batch of 5000: ~3000ms https://console.postgres.ai/gitlab/gitlab-production-tunnel/sessions/2504/commands/7672 Warm Query for batch of 5000: ~300ms https://console.postgres.ai/gitlab/gitlab-production-tunnel/sessions/2504/commands/7673

7544976 rows in vulnerability_occurrences / 5000 rows per batch = 1509 batches

2 minute interval * 1509 batches = 3018 min => 50.3 hours

Thanks @sabrams

Does this MR meet the acceptance criteria?

Conformity

Availability and Testing

Security

If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:

  • [-] Label as security and @ mention @gitlab-com/gl-security/appsec
  • [-] The MR includes necessary changes to maintain consistency between UI, API, email, or other methods
  • [-] Security reports checked/validated by a reviewer from the AppSec team
Edited by Michał Zając

Merge request reports