Skip to content

Geo: Exclude remote stored blobs from checksumming

Michael Kozono requested to merge mk/do-not-checksum-remote-files into master

What does this MR do?

Remote stored blobs are not supported by Geo verification. This should not block releasing verification of locally stored blobs. Note that syncing of remote stored blobs is a beta feature.

This MR excludes remote stored blobs from Geo verification processing and verification counts, as if they don't exist.

Resolves #297478 (closed)

Screenshots (strongly suggested)

Staging.gitlab.com

Before Expected After
image image (it's an existing bug that it says "synchronize" instead of "verify"), also note the popover will still be there but the numbers will all be 0

Local testing

I have 25 locally stored and 25 remotely stored package files. As expected, my UI shows 25 package files in the checksum progress bar, at 100% successfully checksummed. The secondary verification progress bar also shows similar.

Query plans

All three queries executed by VerificationBatchWorker still look ok with 2.1M rows of test data, they still use the indexes that were created for them:

gitlabhq_development=# explain analyze UPDATE packages_package_files SET "verification_state" = 1, "verification_started_at" = NOW() WHERE id IN (SELECT "packages_package_files"."id" FROM "packages_package_files" WHERE "packages_package_files"."file_store" = 1 AND ("packages_package_files"."verification_state" IN (3)) AND ("packages_package_files"."verification_retry_at" IS NULL OR "packages_package_files"."verification_retry_at" < '2021-02-13 10:41:13.320839') ORDER BY verification_retry_at ASC NULLS FIRST LIMIT 9 FOR UPDATE SKIP LOCKED) RETURNING id;
                                                                                                             QUERY PLAN                                                                                                              
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Update on packages_package_files  (cost=1.95..23.66 rows=9 width=792) (actual time=3.824..6.245 rows=9 loops=1)
   ->  Nested Loop  (cost=1.95..23.66 rows=9 width=792) (actual time=1.334..3.404 rows=9 loops=1)
         ->  HashAggregate  (cost=1.52..1.61 rows=9 width=40) (actual time=0.660..0.667 rows=9 loops=1)
               Group Key: "ANY_subquery".id
               ->  Subquery Scan on "ANY_subquery"  (cost=0.42..1.50 rows=9 width=40) (actual time=0.502..0.638 rows=9 loops=1)
                     ->  Limit  (cost=0.42..1.41 rows=9 width=22) (actual time=0.422..0.552 rows=9 loops=1)
                           ->  LockRows  (cost=0.42..50663.80 rows=463131 width=22) (actual time=0.421..0.549 rows=9 loops=1)
                                 ->  Index Scan using packages_packages_failed_verification on packages_package_files packages_package_files_1  (cost=0.42..46032.49 rows=463131 width=22) (actual time=0.388..0.459 rows=9 loops=1)
                                       Filter: (((verification_retry_at IS NULL) OR (verification_retry_at < '2021-02-13 10:41:13.320839-08'::timestamp with time zone)) AND (file_store = 1) AND (verification_state = 3))
         ->  Index Scan using packages_package_files_pkey on packages_package_files  (cost=0.43..2.45 rows=1 width=750) (actual time=0.302..0.302 rows=1 loops=9)
               Index Cond: (id = "ANY_subquery".id)
 Planning Time: 0.576 ms
 Execution Time: 6.389 ms
(13 rows)

gitlabhq_development=# explain analyze UPDATE packages_package_files SET "verification_state" = 1, "verification_started_at" = NOW() WHERE id IN (SELECT "packages_package_files"."id" FROM "packages_package_files" WHERE "packages_package_files"."file_store" = 1 AND ("packages_package_files"."verification_state" IN (0)) ORDER BY verified_at ASC NULLS FIRST LIMIT 10 FOR UPDATE SKIP LOCKED) RETURNING id;
                                                                                                              QUERY PLAN                                                                                                               
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Update on packages_package_files  (cost=1.82..25.99 rows=10 width=792) (actual time=1.457..1.603 rows=10 loops=1)
   ->  Nested Loop  (cost=1.82..25.99 rows=10 width=792) (actual time=1.374..1.404 rows=10 loops=1)
         ->  HashAggregate  (cost=1.39..1.49 rows=10 width=40) (actual time=1.358..1.362 rows=10 loops=1)
               Group Key: "ANY_subquery".id
               ->  Subquery Scan on "ANY_subquery"  (cost=0.42..1.37 rows=10 width=40) (actual time=1.335..1.352 rows=10 loops=1)
                     ->  Limit  (cost=0.42..1.27 rows=10 width=22) (actual time=1.330..1.344 rows=10 loops=1)
                           ->  LockRows  (cost=0.42..50367.28 rows=598931 width=22) (actual time=1.329..1.342 rows=10 loops=1)
                                 ->  Index Scan using packages_packages_pending_verification on packages_package_files packages_package_files_1  (cost=0.42..44377.97 rows=598931 width=22) (actual time=1.314..1.318 rows=10 loops=1)
                                       Filter: ((file_store = 1) AND (verification_state = 0))
         ->  Index Scan using packages_package_files_pkey on packages_package_files  (cost=0.43..2.45 rows=1 width=750) (actual time=0.003..0.003 rows=1 loops=10)
               Index Cond: (id = "ANY_subquery".id)
 Planning Time: 0.295 ms
 Execution Time: 1.693 ms
(13 rows)

gitlabhq_development=# explain analyze SELECT COUNT(*) FROM (SELECT 1 AS one FROM "packages_package_files" WHERE "packages_package_files"."file_store" = 1 AND (("packages_package_files"."verification_state" IN (0)) OR ("packages_package_files"."verification_state" IN (3)) AND ("packages_package_files"."verification_retry_at" IS NULL OR "packages_package_files"."verification_retry_at" < '2021-02-13 10:41:13.339958')) LIMIT 500) subquery_for_count;
                                                                                                             QUERY PLAN                                                                                                             
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=37.29..37.30 rows=1 width=8) (actual time=0.384..0.385 rows=1 loops=1)
   ->  Limit  (cost=0.43..31.04 rows=500 width=4) (actual time=0.031..0.327 rows=500 loops=1)
         ->  Index Scan using packages_packages_needs_verification on packages_package_files  (cost=0.43..56943.88 rows=929977 width=4) (actual time=0.030..0.242 rows=500 loops=1)
               Filter: ((file_store = 1) AND ((verification_state = 0) OR ((verification_state = 3) AND ((verification_retry_at IS NULL) OR (verification_retry_at < '2021-02-13 10:41:13.339958-08'::timestamp with time zone)))))
 Planning Time: 0.267 ms
 Execution Time: 0.426 ms
(6 rows)

Does this MR meet the acceptance criteria?

Conformity

Availability and Testing

Edited by Michael Kozono

Merge request reports