Skip to content

Backfill Vulnerability merge request links

What does this MR do and why?

Related to Database 2/2: Add hasMergeRequest filter to Vul... (#421736 - closed).

Database review

This MR introduces a post-deployment migration to schedule the background jobs.

Output of the post-deployment migration

rake db:migrate:up

main: == 20230907155247 QueueBackfillHasMergeRequestOfVulnerabilityReads: migrating =
main: == 20230907155247 QueueBackfillHasMergeRequestOfVulnerabilityReads: migrated (0.1722s)

rake db:migrate:down

main: == 20230907155247 QueueBackfillHasMergeRequestOfVulnerabilityReads: reverting =
main: == 20230907155247 QueueBackfillHasMergeRequestOfVulnerabilityReads: reverted (0.0252s)
Update query used in background migration
UPDATE
  vulnerability_reads
SET
  has_merge_request = true
FROM
  (SELECT "vulnerability_reads".* FROM "vulnerability_reads" INNER JOIN vulnerability_merge_request_links ON
          vulnerability_reads.vulnerability_id =
          vulnerability_merge_request_links.vulnerability_id WHERE "vulnerability_reads"."vulnerability_id" BETWEEN 1 AND 11513 AND "vulnerability_reads"."vulnerability_id" < 812) as sub_query
WHERE
  vulnerability_reads.vulnerability_id = sub_query.vulnerability_id
 ModifyTable on public.vulnerability_reads  (cost=63.05..596.56 rows=1 width=183) (actual time=74.864..74.868 rows=0 loops=1)
   Buffers: shared hit=69 read=71
   I/O Timings: read=74.132 write=0.000
   ->  Nested Loop  (cost=63.05..596.56 rows=1 width=183) (actual time=74.861..74.864 rows=0 loops=1)
         Buffers: shared hit=69 read=71
         I/O Timings: read=74.132 write=0.000
         ->  Merge Join  (cost=62.49..592.98 rows=1 width=28) (actual time=74.860..74.861 rows=0 loops=1)
               Merge Cond: (vulnerability_reads_1.vulnerability_id = vulnerability_merge_request_links.vulnerability_id)
               Buffers: shared hit=69 read=71
               I/O Timings: read=74.132 write=0.000
               ->  Index Scan using index_vulnerability_reads_on_vulnerability_id on public.vulnerability_reads vulnerability_reads_1  (cost=0.57..579.32 rows=686 width=14) (actual time=1.818..74.805 rows=199 loops=1)
                     Index Cond: ((vulnerability_reads_1.vulnerability_id >= 1) AND (vulnerability_reads_1.vulnerability_id <= 11513) AND (vulnerability_reads_1.vulnerability_id < 812))
                     Buffers: shared hit=67 read=71
                     I/O Timings: read=74.132 write=0.000
               ->  Index Scan using unique_vuln_merge_request_link_vuln_id_and_mr_id on public.vulnerability_merge_request_links  (cost=0.15..14.65 rows=264 width=14) (actual time=0.012..0.012 rows=1 loops=1)
                     Buffers: shared hit=2
                     I/O Timings: read=0.000 write=0.000
         ->  Index Scan using index_vulnerability_reads_on_vulnerability_id on public.vulnerability_reads  (cost=0.57..3.58 rows=1 width=170) (actual time=0.000..0.000 rows=0 loops=0)
               Index Cond: (vulnerability_reads.vulnerability_id = vulnerability_reads_1.vulnerability_id)
               I/O Timings: read=0.000 write=0.000

https://console.postgres.ai/gitlab/gitlab-production-tunnel-pg12/sessions/22008/commands/71210

Time estimations

As of writing, there are 79_709_487 records in the vulnerability_reads table. This means that we will create 7_971 jobs with a gap of 2 minutes in between each, 7971 * 2 / 60 / 24 = ~11 days.

Since the update query runs in around 80ms, with the sub batch size of 200, we will be running 50 update queries for each background job so 50 * 80 = 4 seconds. With the additional overheads of executing each iteration, it's really likely to stay in 1 to 2 minutes for each job.

Also from sampling, DB testing job reports: Estimated Time to complete: 1 week, 3 days, 20 hours, and 34 minutes. Details in !130952 (comment 1550881529)

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by Bala Kumar

Merge request reports