Skip to content

Backfill vulnerability reads identifier_external_ids column

Bala Kumar requested to merge 460080-backfill-identifier-external-ids into master

What does this MR do and why?

Backfills identifier_external_ids column of vulnerability_reads. This is indented to be used by the backend finder for the GraphQL API in #432419.

Database

The estimate from db testing pipeline job for this migration to complete is available in !157639 (comment 1977509967).

Estimated Time to complete: 3 weeks, 2 days, 15 hours, 4 minutes, and 17 seconds

On production this is expected to finish sooner since we have migration optimization strategy.

We have set the Batch size to be 10000 and sub-batch size as 20 and with that we are getting:

Mean query time: 80.7 ms

Max query time: 9343.0 ms.

The max query time seems to be happen when the DB is busy/congested and I am not sure if we can further optimise and it is worth the effort for this one time migration task.

Output of the post-deployment migration

bundle exec rake db:migrate:up:main VERSION=20240627151711

main: == [advisory_lock_connection] object_id: 128000, pg_backend_pid: 91138
main: == 20240627151711 QueueBackfillIdentifierExternalIdsOfVulnerabilityReads: migrating
main: == 20240627151711 QueueBackfillIdentifierExternalIdsOfVulnerabilityReads: migrated (0.0499s)

main: == [advisory_lock_connection] object_id: 128000, pg_backend_pid: 91138

bundle exec rake db:migrate:down:main VERSION=20240627151711

main: == [advisory_lock_connection] object_id: 128000, pg_backend_pid: 91638
main: == 20240627151711 QueueBackfillIdentifierExternalIdsOfVulnerabilityReads: reverting
main: == 20240627151711 QueueBackfillIdentifierExternalIdsOfVulnerabilityReads: reverted (0.0369s)

main: == [advisory_lock_connection] object_id: 128000, pg_backend_pid: 91638
Select query used in background migration
SELECT vulnerability_reads.vulnerability_id  FROM vulnerability_reads WHERE vulnerability_reads.vulnerability_id BETWEEN 30000 AND 40000 ORDER BY vulnerability_reads.vulnerability_id ASC LIMIT 10000
Limit  (cost=0.57..251.01 rows=8456 width=8) (actual time=1.421..134.010 rows=10000 loops=1)
   Buffers: shared hit=515 read=141 dirtied=29
   I/O Timings: read=126.251 write=0.000
   ->  Index Only Scan using index_vulnerability_reads_on_vulnerability_id on public.vulnerability_reads  (cost=0.57..251.01 rows=8456 width=8) (actual time=1.420..133.260 rows=10000 loops=1)
         Index Cond: ((vulnerability_reads.vulnerability_id >= 30000) AND (vulnerability_reads.vulnerability_id <= 40000))
         Heap Fetches: 114
         Buffers: shared hit=515 read=141 dirtied=29
         I/O Timings: read=126.251 write=0.000

https://postgres.ai/console/gitlab/gitlab-production-main/sessions/29600/commands/91880

Update query used in each sub-batch
UPDATE
  vulnerability_reads vr
SET
  identifier_external_ids = selected_ids.external_ids
FROM
  (
    SELECT
      vr.id,
      array_agg(
        vi.external_id
        ORDER BY
          vi.external_id
      ) AS external_ids
    FROM
      vulnerability_reads vr
      JOIN vulnerability_occurrences vo ON vr.vulnerability_id = vo.vulnerability_id
      JOIN vulnerability_occurrence_identifiers voi ON vo.id = voi.occurrence_id
      JOIN vulnerability_identifiers vi ON voi.identifier_id = vi.id
    WHERE
      vr.id IN (
        SELECT
          vulnerability_reads.id
        FROM
          vulnerability_reads
        WHERE
          vulnerability_reads.vulnerability_id BETWEEN 104513160
          AND 104513180
          AND vulnerability_reads.vulnerability_id >= 104513160
          AND vulnerability_reads.vulnerability_id < 104513180
      )
    GROUP BY
      vr.id
  ) selected_ids
WHERE
  vr.id = selected_ids.id
 ModifyTable on public.vulnerability_reads vr  (cost=107.02..193.21 rows=0 width=0) (actual time=394.232..394.243 rows=0 loops=1)
   Buffers: shared hit=1688 read=357 dirtied=121
   I/O Timings: read=381.788 write=0.000
   ->  Nested Loop  (cost=107.02..193.21 rows=24 width=102) (actual time=103.682..104.585 rows=20 loops=1)
         Buffers: shared hit=476 read=88 dirtied=14
         I/O Timings: read=99.612 write=0.000
         ->  Subquery Scan on selected_ids  (cost=106.45..107.17 rows=24 width=104) (actual time=103.659..104.202 rows=20 loops=1)
               Buffers: shared hit=376 read=88 dirtied=14
               I/O Timings: read=99.612 write=0.000
               ->  Aggregate  (cost=106.45..106.93 rows=24 width=40) (actual time=103.642..104.134 rows=20 loops=1)
                     Group Key: vr_1.id
                     Buffers: shared hit=376 read=88 dirtied=14
                     I/O Timings: read=99.612 write=0.000
                     ->  Sort  (cost=106.45..106.51 rows=24 width=28) (actual time=103.475..103.517 rows=28 loops=1)
                           Sort Key: vr_1.id
                           Sort Method: quicksort  Memory: 27kB
                           Buffers: shared hit=366 read=88 dirtied=14
                           I/O Timings: read=99.612 write=0.000
                           ->  Nested Loop  (cost=2.71..105.90 rows=24 width=28) (actual time=26.389..103.350 rows=28 loops=1)
                                 Buffers: shared hit=363 read=88 dirtied=14
                                 I/O Timings: read=99.612 write=0.000
                                 ->  Nested Loop  (cost=2.27..94.94 rows=24 width=16) (actual time=20.630..78.348 rows=28 loops=1)
                                       Buffers: shared hit=277 read=62 dirtied=12
                                       I/O Timings: read=75.312 write=0.000
                                       ->  Nested Loop  (cost=1.70..84.05 rows=14 width=16) (actual time=12.544..48.018 rows=20 loops=1)
                                             Buffers: shared hit=193 read=37 dirtied=2
                                             I/O Timings: read=46.895 write=0.000
                                             ->  Nested Loop  (cost=1.14..71.67 rows=14 width=16) (actual time=7.124..16.868 rows=20 loops=1)
                                                   Buffers: shared hit=115 read=15 dirtied=2
                                                   I/O Timings: read=16.223 write=0.000
                                                   ->  Index Scan using index_vulnerability_reads_on_vulnerability_id on public.vulnerability_reads  (cost=0.57..21.48 rows=14 width=8) (actual time=4.373..13.745 rows=20 loops=1)
                                                         Index Cond: ((vulnerability_reads.vulnerability_id >= 104513160) AND (vulnerability_reads.vulnerability_id <= 104513180) AND (vulnerability_reads.vulnerability_id >= 104513160) AND (vulnerability_reads.vulnerability_id < 104513180))
                                                         Buffers: shared hit=18 read=12 dirtied=2
                                                         I/O Timings: read=13.528 write=0.000
                                                   ->  Index Scan using vulnerability_reads_pkey on public.vulnerability_reads vr_1  (cost=0.57..3.58 rows=1 width=16) (actual time=0.150..0.150 rows=1 loops=20)
                                                         Index Cond: (vr_1.id = vulnerability_reads.id)
                                                         Buffers: shared hit=97 read=3
                                                         I/O Timings: read=2.695 write=0.000
                                             ->  Index Scan using index_vulnerability_occurrences_on_vulnerability_id on public.vulnerability_occurrences vo  (cost=0.57..0.87 rows=1 width=16) (actual time=1.552..1.554 rows=1 loops=20)
                                                   Index Cond: (vo.vulnerability_id = vr_1.vulnerability_id)
                                                   Buffers: shared hit=78 read=22
                                                   I/O Timings: read=30.672 write=0.000
                                       ->  Index Only Scan using index_vulnerability_occurrence_identifiers_on_unique_keys on public.vulnerability_occurrence_identifiers voi  (cost=0.57..0.71 rows=7 width=16) (actual time=1.323..1.511 rows=1 loops=20)
                                             Index Cond: (voi.occurrence_id = vo.id)
                                             Heap Fetches: 23
                                             Buffers: shared hit=84 read=25 dirtied=10
                                             I/O Timings: read=28.417 write=0.000
                                 ->  Index Scan using vulnerability_identifiers_pkey on public.vulnerability_identifiers vi  (cost=0.43..0.46 rows=1 width=28) (actual time=0.887..0.887 rows=1 loops=28)
                                       Index Cond: (vi.id = voi.identifier_id)
                                       Buffers: shared hit=86 read=26 dirtied=2
                                       I/O Timings: read=24.300 write=0.000
         ->  Index Scan using vulnerability_reads_pkey on public.vulnerability_reads vr  (cost=0.57..3.58 rows=1 width=14) (actual time=0.013..0.013 rows=1 loops=20)
               Index Cond: (vr.id = selected_ids.id)
               Buffers: shared hit=100
               I/O Timings: read=0.000 write=0.000
Time: 402.922 ms
  - planning: 8.378 ms
  - execution: 394.544 ms
    - I/O read: 381.788 ms
    - I/O write: 0.000 ms

Shared buffers:
  - hits: 1688 (~13.20 MiB) from the buffer pool
  - reads: 357 (~2.80 MiB) from the OS file cache, including disk I/O
  - dirtied: 121 (~968.00 KiB)
  - writes: 0

https://console.postgres.ai/gitlab/gitlab-production-main/sessions/29600/commands/91901

Estimated Time to complete: 3 weeks, 2 days, 15 hours, 4 minutes, and 17 seconds
  • Interval: 120s
  • Total tuple count: 100807570
  • Max batch size: 0
  • Estimated seconds to complete: 2041457s
  • Estimated number of batches: 10080
  • Average batch time: 202.53s
  • Batch size: 10000
  • N. of batches sampled: 9
  • N. of failed batches: 0

Related to #460080

Edited by Bala Kumar

Merge request reports