Skip to content

Drop Vulnerabilities::FindingPipeline via a background migration

What does this MR do and why?

This MR introduces a background migration to remove Vulnerabilities::FindingPipeline objects associated with a given Vulnerabilities::Finding in a background migration.

We need to do this with a background migration because after merging !72528 (merged) we realized there was a timeout on the cascade drop despite our efforts to prevent that from happening.

The general idea here is:

  1. RemoveOccurrencePipelinesAndDuplicateVulnerabilitiesFindings will run on the whole vulnerability_occurrences table to find the problematic Vulnerabilities::Finding objects (we can't find them using a single query since it's too expensive and will time out)
  2. For every Finding found that way it will remove Vulnerabilities::FindingPipeline objects in batches of 1000.
  3. It will remove the Finding itself.

Database review

db:migrate and db:rollback

gitlab on  341917-drop-vulnerability-occurrence-pipelines-in-background [⇡?] via ⬢ v14.15.4 via 💎 ruby ✦ ➜ bundle exec rails db:migrate                                                                 == 20211116111644 ScheduleRemoveOccurrencePipelinesAndDuplicateVulnerabilitiesFindings: migrating
-- Scheduled 1 RemoveOccurrencePipelinesAndDuplicateVulnerabilitiesFindings jobs with a maximum of 10000 records per batch and an interval of 120 seconds.

The migration is expected to take at least 120 seconds. Expect all jobs to have completed after 2021-11-16 12:31:28 UTC."
== 20211116111644 ScheduleRemoveOccurrencePipelinesAndDuplicateVulnerabilitiesFindings: migrated (0.0378s)

gitlab on  341917-drop-vulnerability-occurrence-pipelines-in-background [⇡!?] via ⬢ v14.15.4 via 💎 ruby ✦ ➜ bundle exec rails db:rollback
== 20211116111644 ScheduleRemoveOccurrencePipelinesAndDuplicateVulnerabilitiesFindings: reverting
== 20211116111644 ScheduleRemoveOccurrencePipelinesAndDuplicateVulnerabilitiesFindings: reverted (0.0000s)

Selecting a batch of 10000 Vulnerabilities::Finding

I picked a batch size of 10 000 since I was getting 300ms when selecting 7500 records and 3s when selecting 10 000 records on #database-lab and I'm not sure why. Andreas mentioned that buffers might be a better metric since I might be fighting for I/O with other clones on #database-lab.

7500, cold cache: https://console.postgres.ai/gitlab/gitlab-production-tunnel-pg12/sessions/7270/commands/25864

10000, cold cache, attempt 1: https://console.postgres.ai/gitlab/gitlab-production-tunnel-pg12/sessions/7270/commands/25862

10000, cold cache, attempt 2: https://console.postgres.ai/gitlab/gitlab-production-tunnel-pg12/sessions/7270/commands/25896

Dropping a batch of 2000 Vulnerabilities::FindingPipeline objects

https://console.postgres.ai/gitlab/gitlab-production-tunnel-pg12/sessions/7302/commands/25992

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Related to #341917

Edited by Michał Zając

Merge request reports