Skip to content

Geo: Ensure Package File checksums on primary

Michael Kozono requested to merge mk/geo-primary-backfills-checksums into master

What does this MR do?

  • Runs background jobs that continuously backfill packages_package_files.verification_checksum on Geo primaries.
  • Concurrent verification jobs are run up to max_verification_capacity setting for that Geo node
    • Actually, up to 1/2 capacity since the setting includes project/wiki repo verification as well
  • Concurrent workers do not pick up the same records
  • Failures are retried
  • Not resolved in this MR, for reviewability: Lost or killed jobs can cause records to become stuck in the "started" state. The next MR handles that !48006 (merged)

Closes #13839 (closed)

Screenshots (strongly suggested)

DB queries

Packages::PackageFile.verification_pending_batch(batch_size: 10)

UPDATE packages_package_files 
SET "verification_state" = 1, "verification_started_at" = NOW() 
WHERE id IN (
  SELECT "packages_package_files"."id" 
  FROM "packages_package_files" 
  WHERE (
    "packages_package_files"."verification_state" IN (0)
  ) 
  ORDER BY verified_at ASC NULLS FIRST 
  LIMIT 10
) 
RETURNING id

Before new index added for this query:

https://explain.depesz.com/s/LZAw

                                                                                                                       QUERY PLAN                                                                                                                        
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Update on packages_package_files  (cost=52425.97..52450.14 rows=10 width=814) (actual time=173.424..179.375 rows=10 loops=1)
   ->  Nested Loop  (cost=52425.97..52450.14 rows=10 width=814) (actual time=170.930..173.547 rows=10 loops=1)
         ->  HashAggregate  (cost=52425.54..52425.64 rows=10 width=40) (actual time=170.112..170.123 rows=10 loops=1)
               Group Key: "ANY_subquery".id
               ->  Subquery Scan on "ANY_subquery"  (cost=52425.39..52425.52 rows=10 width=40) (actual time=170.091..170.096 rows=10 loops=1)
                     ->  Limit  (cost=52425.39..52425.42 rows=10 width=16) (actual time=170.013..170.015 rows=10 loops=1)
                           ->  Sort  (cost=52425.39..53916.22 rows=596329 width=16) (actual time=170.012..170.013 rows=10 loops=1)
                                 Sort Key: packages_package_files_1.verified_at NULLS FIRST
                                 Sort Method: top-N heapsort  Memory: 25kB
                                 ->  Index Scan using index_packages_package_files_on_verification_state on packages_package_files packages_package_files_1  (cost=0.43..39538.94 rows=596329 width=16) (actual time=0.014..114.827 rows=600000 loops=1)
                                       Index Cond: (verification_state = 0)
         ->  Index Scan using packages_package_files_pkey on packages_package_files  (cost=0.43..2.45 rows=1 width=740) (actual time=0.340..0.340 rows=1 loops=10)
               Index Cond: (id = "ANY_subquery".id)
 Planning Time: 0.583 ms
 Execution Time: 179.548 ms
(15 rows)

After new index added for this query:

                                                                                                           QUERY PLAN                                                                                                            
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Update on packages_package_files  (cost=1.67..25.84 rows=10 width=814) (actual time=1.746..8.166 rows=10 loops=1)
   ->  Nested Loop  (cost=1.67..25.84 rows=10 width=814) (actual time=1.080..3.792 rows=10 loops=1)
         ->  HashAggregate  (cost=1.24..1.34 rows=10 width=40) (actual time=0.045..0.052 rows=10 loops=1)
               Group Key: "ANY_subquery".id
               ->  Subquery Scan on "ANY_subquery"  (cost=0.42..1.22 rows=10 width=40) (actual time=0.029..0.041 rows=10 loops=1)
                     ->  Limit  (cost=0.42..1.12 rows=10 width=16) (actual time=0.027..0.037 rows=10 loops=1)
                           ->  Index Scan using packages_packages_pending_verification on packages_package_files packages_package_files_1  (cost=0.42..41269.65 rows=596337 width=16) (actual time=0.026..0.035 rows=10 loops=1)
         ->  Index Scan using packages_package_files_pkey on packages_package_files  (cost=0.43..2.45 rows=1 width=740) (actual time=0.372..0.372 rows=1 loops=10)
               Index Cond: (id = "ANY_subquery".id)
 Planning Time: 0.209 ms
 Execution Time: 8.229 ms
(11 rows)

Packages::PackageFile.verification_failed_batch(batch_size: 10)

UPDATE packages_package_files 
SET "verification_state" = 1, "verification_started_at" = NOW() 
WHERE id IN (
  SELECT "packages_package_files"."id" 
  FROM "packages_package_files" 
  WHERE (
    "packages_package_files"."verification_state" IN (3)
  ) 
  ORDER BY verification_retry_at ASC NULLS FIRST 
  LIMIT 10
) 
RETURNING id

Before new index added for this query:

https://explain.depesz.com/s/og3T

                                                                                                                       QUERY PLAN                                                                                                                        
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Update on packages_package_files  (cost=52660.46..52684.63 rows=10 width=814) (actual time=201.895..210.828 rows=10 loops=1)
   ->  Nested Loop  (cost=52660.46..52684.63 rows=10 width=814) (actual time=200.978..204.428 rows=10 loops=1)
         ->  HashAggregate  (cost=52660.03..52660.13 rows=10 width=40) (actual time=199.984..199.998 rows=10 loops=1)
               Group Key: "ANY_subquery".id
               ->  Subquery Scan on "ANY_subquery"  (cost=52659.88..52660.01 rows=10 width=40) (actual time=199.971..199.976 rows=10 loops=1)
                     ->  Limit  (cost=52659.88..52659.91 rows=10 width=16) (actual time=199.967..199.969 rows=10 loops=1)
                           ->  Sort  (cost=52659.88..54163.66 rows=601509 width=16) (actual time=199.966..199.967 rows=10 loops=1)
                                 Sort Key: packages_package_files_1.verification_retry_at NULLS FIRST
                                 Sort Method: top-N heapsort  Memory: 25kB
                                 ->  Index Scan using index_packages_package_files_on_verification_state on packages_package_files packages_package_files_1  (cost=0.43..39661.49 rows=601509 width=16) (actual time=0.041..143.964 rows=600000 loops=1)
                                       Index Cond: (verification_state = 3)
         ->  Index Scan using packages_package_files_pkey on packages_package_files  (cost=0.43..2.45 rows=1 width=740) (actual time=0.440..0.440 rows=1 loops=10)
               Index Cond: (id = "ANY_subquery".id)
 Planning Time: 0.215 ms
 Execution Time: 210.886 ms
(15 rows)

After new index added for this query:

                                                                                                           QUERY PLAN                                                                                                           
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Update on packages_package_files  (cost=1.67..25.84 rows=10 width=814) (actual time=2.404..7.620 rows=10 loops=1)
   ->  Nested Loop  (cost=1.67..25.84 rows=10 width=814) (actual time=1.381..3.501 rows=10 loops=1)
         ->  HashAggregate  (cost=1.24..1.34 rows=10 width=40) (actual time=0.040..0.049 rows=10 loops=1)
               Group Key: "ANY_subquery".id
               ->  Subquery Scan on "ANY_subquery"  (cost=0.42..1.21 rows=10 width=40) (actual time=0.017..0.034 rows=10 loops=1)
                     ->  Limit  (cost=0.42..1.11 rows=10 width=16) (actual time=0.013..0.027 rows=10 loops=1)
                           ->  Index Scan using packages_packages_failed_verification on packages_package_files packages_package_files_1  (cost=0.42..41360.49 rows=601517 width=16) (actual time=0.012..0.024 rows=10 loops=1)
         ->  Index Scan using packages_package_files_pkey on packages_package_files  (cost=0.43..2.45 rows=1 width=740) (actual time=0.343..0.343 rows=1 loops=10)
               Index Cond: (id = "ANY_subquery".id)
 Planning Time: 0.748 ms
 Execution Time: 7.721 ms
(11 rows)

Packages::PackageFile.needs_verification_count(limit: 1000)

SELECT COUNT(*) 
FROM (
  SELECT 1 AS one 
  FROM "packages_package_files" 
  WHERE (
    (
      "packages_package_files"."verification_state" IN (0)
    ) 
    OR (
      "packages_package_files"."verification_state" IN (3)
    )
  ) LIMIT 1000
) subquery_for_count

Before adding the partial index:

https://explain.depesz.com/s/Eats

                                                                QUERY PLAN                                                                 
-------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=73.02..73.03 rows=1 width=8) (actual time=50.320..50.321 rows=1 loops=1)
   ->  Limit  (cost=0.00..60.52 rows=1000 width=4) (actual time=49.970..50.256 rows=1000 loops=1)
         ->  Seq Scan on packages_package_files  (cost=0.00..62159.93 rows=1027029 width=4) (actual time=49.969..50.158 rows=1000 loops=1)
               Filter: ((verification_state = 0) OR (verification_state = 3))
               Rows Removed by Filter: 300025
 Planning Time: 0.117 ms
 Execution Time: 50.348 ms
(7 rows)

After adding the partial index:

https://explain.depesz.com/s/Kz96

                                                                                        QUERY PLAN                                                                                         
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=57.94..57.95 rows=1 width=8) (actual time=0.681..0.682 rows=1 loops=1)
   ->  Limit  (cost=0.43..45.44 rows=1000 width=4) (actual time=0.084..0.598 rows=1000 loops=1)
         ->  Index Only Scan using packages_packages_needs_verification on packages_package_files  (cost=0.43..46226.42 rows=1027043 width=4) (actual time=0.083..0.481 rows=1000 loops=1)
               Heap Fetches: 1000
 Planning Time: 3.037 ms
 Execution Time: 0.736 ms
(6 rows)

Does this MR meet the acceptance criteria?

Conformity

Availability and Testing

Edited by Michael Kozono

Merge request reports