Skip to content

Regularly reverify Packages on primary

What does this MR do?

We need to regularly re-verify data on the primary to protect against data corruption. In combination with https://gitlab.com/gitlab-org/gitlab-ee/issues/13842, this will ensure that packages are regularly re-verified on all secondary nodes as well.

SQL query example

UPDATE packages_package_files 
SET "verification_state" = 0
WHERE id IN (
              SELECT "packages_package_files"."id"
              FROM "packages_package_files" 
              WHERE 
                ("packages_package_files"."verification_state" IN (2)) AND 
                (verified_at < '2021-01-29 14:42:17.792192') 
              ORDER BY "packages_package_files"."verified_at" ASC 
              LIMIT 10
            )

Explain Analyze:

gitlabhq_development=# explain analyze UPDATE packages_package_files
SET "verification_state" = 0
WHERE id IN (
              SELECT "packages_package_files"."id"
              FROM "packages_package_files"
              WHERE
                ("packages_package_files"."verification_state" IN (2)) AND
                (verified_at < '2021-01-29 14:42:17.792192')
              ORDER BY "packages_package_files"."verified_at" ASC
              LIMIT 10
            )
;
                                                                                                                   QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Update on packages_package_files  (cost=21054.55..21078.65 rows=10 width=782) (actual time=41.632..41.633 rows=0 loops=1)
   ->  Nested Loop  (cost=21054.55..21078.65 rows=10 width=782) (actual time=41.631..41.632 rows=0 loops=1)
         ->  HashAggregate  (cost=21054.13..21054.23 rows=10 width=40) (actual time=41.631..41.632 rows=0 loops=1)
               Group Key: "ANY_subquery".id
               ->  Subquery Scan on "ANY_subquery"  (cost=21053.98..21054.10 rows=10 width=40) (actual time=41.630..41.631 rows=0 loops=1)
                     ->  Limit  (cost=21053.98..21054.00 rows=10 width=16) (actual time=41.630..41.630 rows=0 loops=1)
                           ->  Sort  (cost=21053.98..21054.06 rows=32 width=16) (actual time=41.629..41.629 rows=0 loops=1)
                                 Sort Key: packages_package_files_1.verified_at
                                 Sort Method: quicksort  Memory: 25kB
                                 ->  Index Scan using index_packages_package_files_on_verification_state on packages_package_files packages_package_files_1  (cost=0.42..21053.29 rows=32 width=16) (actual time=41.625..41.625 rows=0 loops=1)
                                       Index Cond: (verification_state = 2)
                                       Filter: (verified_at < '2021-01-29 14:42:17.792192+02'::timestamp with time zone)
                                       Rows Removed by Filter: 320000
         ->  Index Scan using packages_package_files_pkey on packages_package_files  (cost=0.42..2.44 rows=1 width=748) (never executed)
               Index Cond: (id = "ANY_subquery".id)
 Planning Time: 0.253 ms
 Execution Time: 41.743 ms
(17 rows)

Does this MR meet the acceptance criteria?

Conformity

Availability and Testing

Related to #13843 (closed)

Edited by Michael Kozono

Merge request reports