Skip to content

Drop invalid Vulnerabilities

Michał Zając requested to merge 301141-error-getting-vulnerability-list into master

What does this MR do?

This MR schedules a background migration to remove invalid Vulnerabilities. By invalid we mean that there's a Vulnerability that has 0 associated Vulnerabilities::Finding objects. See #301141 (closed) for more details

Migration

This has been updated to match discussion in !60023 (comment 563844188)

Cold cache

Warm cache

Estimated execution time

Records in table: 7 599 563
Records to delete: 2054
Batch size: 10 000
Loops: 760
Delay: 120s

(4.728s + 0.693s + 120s) + 759 * (0.017s + 0.001s + 120s) ~= 91205,413 seconds
91205,413 / 3600 ~= 25,33 hours

gprd-db-archive-console

I also ran the SELECT query against gprd-db-archive-console twice to check cold and warm runtimes

gprd-db-archive-console
gitlabhq_production=> EXPLAIN (ANALYZE,BUFFERS,TIMING,COSTS) SELECT * FROM vulnerabilities LEFT OUTER JOIN vulnerability_occurrences ON vulnerability_occurrences.vulnerability_id = vulnerabilities.id WHERE vulnerabilities.id BETWEEN 1 AND 10000 AND vulnerability_occurrences.vulnerability_id IS NULL;
                                                                                        QUERY PLAN                                                                                         
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop Anti Join  (cost=0.87..42458.18 rows=1 width=1998) (actual time=6.605..111.503 rows=832 loops=1)
   Buffers: shared hit=33300 read=8245
   I/O Timings: read=67.820
   ->  Index Scan using vulnerabilities_pkey on vulnerabilities  (cost=0.43..11575.73 rows=9228 width=284) (actual time=0.036..42.051 rows=9476 loops=1)
         Index Cond: ((id >= 1) AND (id <= 10000))
         Buffers: shared hit=644 read=3476
         I/O Timings: read=28.940
   ->  Index Scan using index_vulnerability_occurrences_on_vulnerability_id on vulnerability_occurrences  (cost=0.43..3.34 rows=1 width=1714) (actual time=0.007..0.007 rows=1 loops=9476)
         Index Cond: (vulnerability_id = vulnerabilities.id)
         Buffers: shared hit=32656 read=4769
         I/O Timings: read=38.879
 Planning Time: 9.614 ms
 Execution Time: 111.667 ms
(13 rows)

gitlabhq_production=> EXPLAIN (ANALYZE,BUFFERS,TIMING,COSTS) SELECT * FROM vulnerabilities LEFT OUTER JOIN vulnerability_occurrences ON vulnerability_occurrences.vulnerability_id = vulnerabilities.id WHERE vulnerabilities.id BETWEEN 1 AND 10000 AND vulnerability_occurrences.vulnerability_id IS NULL;
                                                                                        QUERY PLAN                                                                                         
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop Anti Join  (cost=0.87..42458.18 rows=1 width=1998) (actual time=1.211..31.934 rows=832 loops=1)
   Buffers: shared hit=41542
   ->  Index Scan using vulnerabilities_pkey on vulnerabilities  (cost=0.43..11575.73 rows=9228 width=284) (actual time=0.013..8.584 rows=9476 loops=1)
         Index Cond: ((id >= 1) AND (id <= 10000))
         Buffers: shared hit=4117
   ->  Index Scan using index_vulnerability_occurrences_on_vulnerability_id on vulnerability_occurrences  (cost=0.43..3.34 rows=1 width=1714) (actual time=0.002..0.002 rows=1 loops=9476)
         Index Cond: (vulnerability_id = vulnerabilities.id)
         Buffers: shared hit=37425
 Planning Time: 0.494 ms
 Execution Time: 32.097 ms
(10 rows)

up

rails db:migrate
== 20210423160427 ScheduleDropInvalidVulnerabilities: migrating ===============
-- Scheduling DropInvalidVulnerabilities jobs
-- Scheduled 0 DropInvalidVulnerabilities jobs with a maximum of 2500 records per batch and an interval of 120 seconds.

The migration is expected to take at least 0 seconds. Expect all jobs to have completed after 2021-04-23 16:14:02 UTC."
== 20210423160427 ScheduleDropInvalidVulnerabilities: migrated (0.0160s) ======

down

rails db:rollback
== 20210423160427 ScheduleDropInvalidVulnerabilities: reverting ===============
== 20210423160427 ScheduleDropInvalidVulnerabilities: reverted (0.0000s) ======

Timings

Selecting a batch

Batch size of 2500

https://console.postgres.ai/gitlab/gitlab-production-tunnel/sessions/3633/commands/12164

Query
SELECT * FROM vulnerabilities LEFT OUTER JOIN vulnerability_occurrences ON vulnerability_occurrences.vulnerability_id = vulnerabilities.id WHERE vulnerabilities.id BETWEEN 1 AND 2500 AND vulnerability_occurrences.id IS NULL
Query plan
 Gather  (cost=1045.93..9973.01 rows=1 width=1998) (actual time=234.357..375.505 rows=496 loops=1)
   Workers Planned: 1
   Workers Launched: 1
   Buffers: shared hit=7758 read=2633 dirtied=230
   I/O Timings: read=674.683
   ->  Nested Loop Left Join  (cost=45.93..8972.91 rows=1 width=1998) (actual time=227.392..360.942 rows=248 loops=2)
         Filter: (vulnerability_occurrences.id IS NULL)
         Rows Removed by Filter: 936
         Buffers: shared hit=7758 read=2633 dirtied=230
         I/O Timings: read=674.683
         ->  Parallel Bitmap Heap Scan on public.vulnerabilities  (cost=45.50..4232.66 rows=1381 width=284) (actual time=15.504..157.073 rows=1184 loops=2)
               Buffers: shared hit=6 read=1146 dirtied=225
               I/O Timings: read=290.421
               ->  Bitmap Index Scan using vulnerabilities_pkey  (cost=0.00..44.91 rows=2348 width=0) (actual time=19.637..19.638 rows=2599 loops=1)
                     Index Cond: ((vulnerabilities.id >= 1) AND (vulnerabilities.id <= 2500))
                     Buffers: shared hit=6 read=34
                     I/O Timings: read=18.657
         ->  Index Scan using index_vulnerability_occurrences_on_vulnerability_id on public.vulnerability_occurrences  (cost=0.43..3.42 rows=1 width=1714) (actual time=0.170..0.171 rows=1 loops=2368)
               Index Cond: (vulnerability_occurrences.vulnerability_id = vulnerabilities.id)
               Buffers: shared hit=7752 read=1487 dirtied=5
               I/O Timings: read=384.262
Statistics
Time: 381.573 ms
  - planning: 5.930 ms
  - execution: 375.643 ms (estimated* for prod: 0.017...0.346 s)
    - I/O read: 674.683 ms
    - I/O write: N/A

Shared buffers:
  - hits: 7758 (~60.60 MiB) from the buffer pool
  - reads: 2633 (~20.60 MiB) from the OS file cache, including disk I/O
  - dirtied: 230 (~1.80 MiB)
  - writes: 0

Deleting 100 records from a batch

Query
DELETE FROM vulnerabilities WHERE vulnerabilities.id IN (6, 15, 18, 23, 57, 58, 67, 87, 104, 179, 183, 218, 257, 289, 341, 356, 360, 410, 437, 443, 451, 465, 534, 546, 577, 601, 606, 658, 659, 665, 747, 764, 770, 777, 833, 950, 970, 984, 1005, 1023, 1040, 1052, 1059, 1072, 1135, 1164, 1180, 1245, 1253, 1292, 1302, 1306, 1315, 1368, 1385, 1390, 1402, 1439, 1466, 1494, 1508, 1513, 1515, 1584, 1585, 1600, 1601, 1613, 1726, 1801, 1820, 1823, 1853, 1867, 1877, 1888, 1912, 1926, 1950, 1983, 1996, 1998, 2007, 2014, 2023, 2025, 2051, 2130, 2132, 2133, 2152, 2168, 2178, 2191, 2275, 2312, 2340, 2429, 2449, 2480);
Plan
 ModifyTable on public.vulnerabilities  (cost=0.43..320.90 rows=100 width=6) (actual time=142.176..142.177 rows=0 loops=1)
   Buffers: shared hit=532 read=120 dirtied=93
   I/O Timings: read=137.902
   ->  Index Scan using vulnerabilities_pkey on public.vulnerabilities  (cost=0.43..320.90 rows=100 width=6) (actual time=3.564..140.170 rows=94 loops=1)
         Index Cond: (vulnerabilities.id = ANY ('{6,15,18,23,57,58,67,87,104,179,183,218,257,289,341,356,360,410,437,443,451,465,534,546,577,601,606,658,659,665,747,764,770,777,833,950,970,984,1005,1023,1040,1052,1059,1072,1135,1164,1180,1245,1253,1292,1302,1306,1315,1368,1385,1390,1402,1439,1466,1494,1508,1513,1515,1584,1585,1600,1601,1613,1726,1801,1820,1823,1853,1867,1877,1888,1912,1926,1950,1983,1996,1998,2007,2014,2023,2025,2051,2130,2132,2133,2152,2168,2178,2191,2275,2312,2340,2429,2449,2480}'::bigint[]))
         Buffers: shared hit=278 read=115 dirtied=15
         I/O Timings: read=137.640
Statistics
Time: 1.704 s
  - planning: 2.108 ms
  - execution: 1.702 s (estimated* for prod: 0.088...1.562 s)
    - I/O read: 137.902 ms
    - I/O write: N/A

Shared buffers:
  - hits: 532 (~4.20 MiB) from the buffer pool
  - reads: 120 (~960.00 KiB) from the OS file cache, including disk I/O
  - dirtied: 93 (~744.00 KiB)
  - writes: 0

Deleting a batch

The estimate here considers that we're dropping entire batch of 2500 records, that shouldn't happen really often.

https://console.postgres.ai/gitlab/gitlab-production-tunnel/sessions/3633/commands/12162

Query
DELETE FROM vulnerabilities WHERE vulnerabilities.id BETWEEN 1 AND 2500
Plan
ModifyTable on public.vulnerabilities  (cost=0.43..2991.55 rows=2348 width=6) (actual time=59.004..59.005 rows=0 loops=1)
   Buffers: shared hit=5552 read=1150 dirtied=1001
   I/O Timings: read=41.968
   ->  Index Scan using vulnerabilities_pkey on public.vulnerabilities  (cost=0.43..2991.55 rows=2348 width=6) (actual time=0.227..49.110 rows=2368 loops=1)
         Index Cond: ((vulnerabilities.id >= 1) AND (vulnerabilities.id <= 2500))
         Buffers: shared hit=107 read=1146 dirtied=236
         I/O Timings: read=41.860
Statistics
Time: 1.347 s
  - planning: 1.878 ms
  - execution: 1.345 s (estimated* for prod: 0.973...1.323 s)
    - I/O read: 41.968 ms
    - I/O write: N/A

Shared buffers:
  - hits: 5552 (~43.40 MiB) from the buffer pool
  - reads: 1150 (~9.00 MiB) from the OS file cache, including disk I/O
  - dirtied: 1001 (~7.80 MiB)
  - writes: 0

Does this MR meet the acceptance criteria?

Conformity

Availability and Testing

Security

If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:

  • [-] Label as security and @ mention @gitlab-com/gl-security/appsec
  • [-] The MR includes necessary changes to maintain consistency between UI, API, email, or other methods
  • [-] Security reports checked/validated by a reviewer from the AppSec team
Edited by Michał Zając

Merge request reports