Skip to content

Geo framework: Fix retry time of verification failures

What does this MR do?

Problems

staging.gitlab.com has package file checksum failures which surfaced the following problems with failure retry logic:

  1. When an error is raised in track_checksum_result! while attempting to transition to verification_succeeded, some fields have already been cleared in-memory during the before_transition block. This causes that kind of failure to always set verification_retry_count to 1, instead of incrementing it.
  2. verification_failed_batch_relation and needs_verification_relation are not excluding things which have a verification_retry_at in the future

To do

Resolves #300255 (closed)

Query plans

Packages::PackageFile.verification_failed_batch

Test data setup: https://gitlab.com/-/snippets/2042944

https://explain.depesz.com/s/lU5T

                                                                                                          QUERY PLAN                                                                                                           
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Update on packages_package_files  (cost=1.17..8.12 rows=3 width=782) (actual time=4.667..6.325 rows=3 loops=1)
   ->  Nested Loop  (cost=1.17..8.12 rows=3 width=782) (actual time=1.541..1.992 rows=3 loops=1)
         ->  HashAggregate  (cost=0.74..0.77 rows=3 width=40) (actual time=0.731..0.733 rows=3 loops=1)
               Group Key: "ANY_subquery".id
               ->  Subquery Scan on "ANY_subquery"  (cost=0.42..0.74 rows=3 width=40) (actual time=0.661..0.712 rows=3 loops=1)
                     ->  Limit  (cost=0.42..0.71 rows=3 width=16) (actual time=0.583..0.632 rows=3 loops=1)
                           ->  Index Scan using packages_packages_failed_verification on packages_package_files packages_package_files_1  (cost=0.42..42921.84 rows=459104 width=16) (actual time=0.582..0.631 rows=3 loops=1)
                                 Filter: ((verification_retry_at IS NULL) OR (verification_retry_at < '2021-02-02 05:44:30.114031-08'::timestamp with time zone))
         ->  Index Scan using packages_package_files_pkey on packages_package_files  (cost=0.43..2.45 rows=1 width=740) (actual time=0.417..0.417 rows=1 loops=3)
               Index Cond: (id = "ANY_subquery".id)
 Planning Time: 2.559 ms
 Execution Time: 6.499 ms
(12 rows)
UPDATE packages_package_files
SET
  "verification_state" = 1,
  "verification_started_at" = NOW()
WHERE id IN (
  SELECT "packages_package_files"."id" 
  FROM "packages_package_files" 
  WHERE (
    "packages_package_files"."verification_state" IN (3)
  )
  AND (
    "packages_package_files"."verification_retry_at" IS NULL
    OR "packages_package_files"."verification_retry_at" < '2021-02-02 05:44:30.114031'
  )
  ORDER BY verification_retry_at ASC NULLS FIRST
  LIMIT 10
)
RETURNING id

Packages::PackageFile.needs_verification_count

Test data setup: https://gitlab.com/-/snippets/2042944

https://explain.depesz.com/s/XOOV

                                                                                                 QUERY PLAN                                                                                                  
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=71.62..71.63 rows=1 width=8) (actual time=2.610..2.611 rows=1 loops=1)
   ->  Limit  (cost=0.43..59.12 rows=1000 width=4) (actual time=1.547..2.521 rows=1000 loops=1)
         ->  Index Scan using packages_packages_needs_verification on packages_package_files  (cost=0.43..54432.96 rows=927361 width=4) (actual time=1.546..2.386 rows=1000 loops=1)
               Filter: ((verification_state = 0) OR ((verification_state = 3) AND ((verification_retry_at IS NULL) OR (verification_retry_at < '2021-02-02 05:49:25.612157-08'::timestamp with time zone))))
 Planning Time: 0.605 ms
 Execution Time: 2.648 ms
(6 rows)
SELECT COUNT(*) 
FROM (
  SELECT 1 AS one 
  FROM "packages_package_files" 
  WHERE (
    (
      "packages_package_files"."verification_state" IN (0)
    ) OR (
      "packages_package_files"."verification_state" IN (3)
    ) AND (
      "packages_package_files"."verification_retry_at" IS NULL
      OR "packages_package_files"."verification_retry_at" < '2021-02-02 05:49:25.612157')
    ) LIMIT 1000
  )
subquery_for_count

PackageFileRegistry.verification_failed_batch

Test data setup: https://gitlab.com/-/snippets/2057444

UPDATE package_file_registry
SET 
  "verification_state" = 1,
  "verification_started_at" = NOW()
WHERE package_file_id IN (
  SELECT "package_file_registry"."package_file_id"
  FROM "package_file_registry" 
  WHERE ("package_file_registry"."verification_state" IN (3)) 
  AND (
    "package_file_registry"."verification_retry_at" IS NULL 
    OR "package_file_registry"."verification_retry_at" < '2021-02-02 06:23:50.006618'
  ) 
  AND (
    "package_file_registry"."state" IN (2)
  )
  ORDER BY verification_retry_at ASC NULLS FIRST
  LIMIT 10
) RETURNING package_file_id
                                                                                                            QUERY PLAN                                                                                                            
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Update on package_file_registry  (cost=17.70..74.56 rows=10 width=858) (actual time=2.184..11.697 rows=20 loops=1)
   ->  Nested Loop  (cost=17.70..74.56 rows=10 width=858) (actual time=1.633..4.013 rows=20 loops=1)
         ->  HashAggregate  (cost=17.27..17.33 rows=6 width=32) (actual time=0.402..0.413 rows=10 loops=1)
               Group Key: "ANY_subquery".package_file_id
               ->  Subquery Scan on "ANY_subquery"  (cost=0.42..17.24 rows=10 width=32) (actual time=0.115..0.389 rows=10 loops=1)
                     ->  Limit  (cost=0.42..17.14 rows=10 width=12) (actual time=0.110..0.377 rows=10 loops=1)
                           ->  Index Scan using package_file_registry_failed_verification on package_file_registry package_file_registry_1  (cost=0.42..112897.16 rows=67495 width=12) (actual time=0.109..0.374 rows=10 loops=1)
                                 Filter: ((verification_retry_at IS NULL) OR (verification_retry_at < '2021-02-02 06:23:50.006618-08'::timestamp with time zone))
         ->  Index Scan using index_package_file_registry_on_repository_id on package_file_registry  (cost=0.43..9.51 rows=2 width=820) (actual time=0.326..0.357 rows=2 loops=10)
               Index Cond: (package_file_id = "ANY_subquery".package_file_id)
 Planning Time: 0.420 ms
 Execution Time: 11.809 ms
(12 rows)

PackageFileRegistry.needs_verification_count

Test data setup: https://gitlab.com/-/snippets/2057444

SELECT COUNT(*) 
FROM (
  SELECT 1 AS one 
  FROM "package_file_registry" 
  WHERE (
    ("package_file_registry"."verification_state" IN (0)) 
    OR ("package_file_registry"."verification_state" IN (3)
  ) 
  AND (
    "package_file_registry"."verification_retry_at" IS NULL
    OR "package_file_registry"."verification_retry_at" < '2021-02-02 06:23:50.199724')
  ) 
  AND (
    "package_file_registry"."state" IN (2)
  ) 
  LIMIT 1000
) 
subquery_for_count
                                                                                                 QUERY PLAN                                                                                                  
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=165.26..165.27 rows=1 width=8) (actual time=1.283..1.284 rows=1 loops=1)
   ->  Limit  (cost=0.43..152.76 rows=1000 width=4) (actual time=0.045..1.165 rows=1000 loops=1)
         ->  Index Scan using index_package_file_registry_on_state on package_file_registry  (cost=0.43..61739.31 rows=405295 width=4) (actual time=0.044..0.990 rows=1000 loops=1)
               Index Cond: (state = 2)
               Filter: ((verification_state = 0) OR ((verification_state = 3) AND ((verification_retry_at IS NULL) OR (verification_retry_at < '2021-02-02 06:23:50.199724-08'::timestamp with time zone))))
               Rows Removed by Filter: 524
 Planning Time: 0.456 ms
 Execution Time: 1.322 ms
(8 rows)

Does this MR meet the acceptance criteria?

Conformity

Availability and Testing

Edited by Michael Kozono

Merge request reports