Backfill vulnerability reads identifier_external_ids column
What does this MR do and why?
Backfills identifier_external_ids
column of vulnerability_reads
. This is indented to be used by the backend finder for the GraphQL API in #432419.
Database
The estimate from db testing pipeline job for this migration to complete is available in !157639 (comment 1977509967).
Estimated Time to complete: 3 weeks, 2 days, 15 hours, 4 minutes, and 17 seconds
On production this is expected to finish sooner since we have migration optimization strategy.
We have set the Batch size to be 10000
and sub-batch size as 20
and with that we are getting:
Mean query time: 80.7 ms
Max query time: 9343.0 ms.
The max query time seems to be happen when the DB is busy/congested and I am not sure if we can further optimise and it is worth the effort for this one time migration task.
Output of the post-deployment migration
bundle exec rake db:migrate:up:main VERSION=20240627151711
main: == [advisory_lock_connection] object_id: 128000, pg_backend_pid: 91138
main: == 20240627151711 QueueBackfillIdentifierExternalIdsOfVulnerabilityReads: migrating
main: == 20240627151711 QueueBackfillIdentifierExternalIdsOfVulnerabilityReads: migrated (0.0499s)
main: == [advisory_lock_connection] object_id: 128000, pg_backend_pid: 91138
bundle exec rake db:migrate:down:main VERSION=20240627151711
main: == [advisory_lock_connection] object_id: 128000, pg_backend_pid: 91638
main: == 20240627151711 QueueBackfillIdentifierExternalIdsOfVulnerabilityReads: reverting
main: == 20240627151711 QueueBackfillIdentifierExternalIdsOfVulnerabilityReads: reverted (0.0369s)
main: == [advisory_lock_connection] object_id: 128000, pg_backend_pid: 91638
Select query used in background migration
SELECT vulnerability_reads.vulnerability_id FROM vulnerability_reads WHERE vulnerability_reads.vulnerability_id BETWEEN 30000 AND 40000 ORDER BY vulnerability_reads.vulnerability_id ASC LIMIT 10000
Limit (cost=0.57..251.01 rows=8456 width=8) (actual time=1.421..134.010 rows=10000 loops=1)
Buffers: shared hit=515 read=141 dirtied=29
I/O Timings: read=126.251 write=0.000
-> Index Only Scan using index_vulnerability_reads_on_vulnerability_id on public.vulnerability_reads (cost=0.57..251.01 rows=8456 width=8) (actual time=1.420..133.260 rows=10000 loops=1)
Index Cond: ((vulnerability_reads.vulnerability_id >= 30000) AND (vulnerability_reads.vulnerability_id <= 40000))
Heap Fetches: 114
Buffers: shared hit=515 read=141 dirtied=29
I/O Timings: read=126.251 write=0.000
https://postgres.ai/console/gitlab/gitlab-production-main/sessions/29600/commands/91880
Update query used in each sub-batch
UPDATE
vulnerability_reads vr
SET
identifier_external_ids = selected_ids.external_ids
FROM
(
SELECT
vr.id,
array_agg(
vi.external_id
ORDER BY
vi.external_id
) AS external_ids
FROM
vulnerability_reads vr
JOIN vulnerability_occurrences vo ON vr.vulnerability_id = vo.vulnerability_id
JOIN vulnerability_occurrence_identifiers voi ON vo.id = voi.occurrence_id
JOIN vulnerability_identifiers vi ON voi.identifier_id = vi.id
WHERE
vr.id IN (
SELECT
vulnerability_reads.id
FROM
vulnerability_reads
WHERE
vulnerability_reads.vulnerability_id BETWEEN 104513160
AND 104513180
AND vulnerability_reads.vulnerability_id >= 104513160
AND vulnerability_reads.vulnerability_id < 104513180
)
GROUP BY
vr.id
) selected_ids
WHERE
vr.id = selected_ids.id
ModifyTable on public.vulnerability_reads vr (cost=107.02..193.21 rows=0 width=0) (actual time=394.232..394.243 rows=0 loops=1)
Buffers: shared hit=1688 read=357 dirtied=121
I/O Timings: read=381.788 write=0.000
-> Nested Loop (cost=107.02..193.21 rows=24 width=102) (actual time=103.682..104.585 rows=20 loops=1)
Buffers: shared hit=476 read=88 dirtied=14
I/O Timings: read=99.612 write=0.000
-> Subquery Scan on selected_ids (cost=106.45..107.17 rows=24 width=104) (actual time=103.659..104.202 rows=20 loops=1)
Buffers: shared hit=376 read=88 dirtied=14
I/O Timings: read=99.612 write=0.000
-> Aggregate (cost=106.45..106.93 rows=24 width=40) (actual time=103.642..104.134 rows=20 loops=1)
Group Key: vr_1.id
Buffers: shared hit=376 read=88 dirtied=14
I/O Timings: read=99.612 write=0.000
-> Sort (cost=106.45..106.51 rows=24 width=28) (actual time=103.475..103.517 rows=28 loops=1)
Sort Key: vr_1.id
Sort Method: quicksort Memory: 27kB
Buffers: shared hit=366 read=88 dirtied=14
I/O Timings: read=99.612 write=0.000
-> Nested Loop (cost=2.71..105.90 rows=24 width=28) (actual time=26.389..103.350 rows=28 loops=1)
Buffers: shared hit=363 read=88 dirtied=14
I/O Timings: read=99.612 write=0.000
-> Nested Loop (cost=2.27..94.94 rows=24 width=16) (actual time=20.630..78.348 rows=28 loops=1)
Buffers: shared hit=277 read=62 dirtied=12
I/O Timings: read=75.312 write=0.000
-> Nested Loop (cost=1.70..84.05 rows=14 width=16) (actual time=12.544..48.018 rows=20 loops=1)
Buffers: shared hit=193 read=37 dirtied=2
I/O Timings: read=46.895 write=0.000
-> Nested Loop (cost=1.14..71.67 rows=14 width=16) (actual time=7.124..16.868 rows=20 loops=1)
Buffers: shared hit=115 read=15 dirtied=2
I/O Timings: read=16.223 write=0.000
-> Index Scan using index_vulnerability_reads_on_vulnerability_id on public.vulnerability_reads (cost=0.57..21.48 rows=14 width=8) (actual time=4.373..13.745 rows=20 loops=1)
Index Cond: ((vulnerability_reads.vulnerability_id >= 104513160) AND (vulnerability_reads.vulnerability_id <= 104513180) AND (vulnerability_reads.vulnerability_id >= 104513160) AND (vulnerability_reads.vulnerability_id < 104513180))
Buffers: shared hit=18 read=12 dirtied=2
I/O Timings: read=13.528 write=0.000
-> Index Scan using vulnerability_reads_pkey on public.vulnerability_reads vr_1 (cost=0.57..3.58 rows=1 width=16) (actual time=0.150..0.150 rows=1 loops=20)
Index Cond: (vr_1.id = vulnerability_reads.id)
Buffers: shared hit=97 read=3
I/O Timings: read=2.695 write=0.000
-> Index Scan using index_vulnerability_occurrences_on_vulnerability_id on public.vulnerability_occurrences vo (cost=0.57..0.87 rows=1 width=16) (actual time=1.552..1.554 rows=1 loops=20)
Index Cond: (vo.vulnerability_id = vr_1.vulnerability_id)
Buffers: shared hit=78 read=22
I/O Timings: read=30.672 write=0.000
-> Index Only Scan using index_vulnerability_occurrence_identifiers_on_unique_keys on public.vulnerability_occurrence_identifiers voi (cost=0.57..0.71 rows=7 width=16) (actual time=1.323..1.511 rows=1 loops=20)
Index Cond: (voi.occurrence_id = vo.id)
Heap Fetches: 23
Buffers: shared hit=84 read=25 dirtied=10
I/O Timings: read=28.417 write=0.000
-> Index Scan using vulnerability_identifiers_pkey on public.vulnerability_identifiers vi (cost=0.43..0.46 rows=1 width=28) (actual time=0.887..0.887 rows=1 loops=28)
Index Cond: (vi.id = voi.identifier_id)
Buffers: shared hit=86 read=26 dirtied=2
I/O Timings: read=24.300 write=0.000
-> Index Scan using vulnerability_reads_pkey on public.vulnerability_reads vr (cost=0.57..3.58 rows=1 width=14) (actual time=0.013..0.013 rows=1 loops=20)
Index Cond: (vr.id = selected_ids.id)
Buffers: shared hit=100
I/O Timings: read=0.000 write=0.000
Time: 402.922 ms
- planning: 8.378 ms
- execution: 394.544 ms
- I/O read: 381.788 ms
- I/O write: 0.000 ms
Shared buffers:
- hits: 1688 (~13.20 MiB) from the buffer pool
- reads: 357 (~2.80 MiB) from the OS file cache, including disk I/O
- dirtied: 121 (~968.00 KiB)
- writes: 0
https://console.postgres.ai/gitlab/gitlab-production-main/sessions/29600/commands/91901
Estimated Time to complete: 3 weeks, 2 days, 15 hours, 4 minutes, and 17 seconds
- Interval: 120s
- Total tuple count: 100807570
- Max batch size: 0
- Estimated seconds to complete: 2041457s
- Estimated number of batches: 10080
- Average batch time: 202.53s
- Batch size: 10000
- N. of batches sampled: 9
- N. of failed batches: 0
Related to #460080