Geo: New worker to mark registries pending to verify
What does this MR do and why?
This MR creates a new worker to support the reverification of all registries that belong to a particular data type in Geo to allow periodic sync jobs to process them.
How to set up and validate locally
-
In the Rails console trigger the worker using the following (the selected data type is just an example):
Geo::BulkMarkVerificationPendingBatchWorker.perform_with_capacity('Geo::LfsObjectRegistry') -
Check that all the registries have a verification_state equals to
0(which means they are in averification_pendingstate).Geo::LfsObjectRegistry.all.pluck(:verification_state)Note: If there are more than 1.000 registries then this change will be limited by the worker capacity.
-
Look at the replicables admin UI and check that the
Last verifiedtext showsUnknown. -
Wait for one minute to see the
Geo::VerificationBatchWorkerstarts verifying the registries.
Database
Queries
NOTE: EXPLAIN ANALYZE was run against a GDK instance with 2422 job_artifact_registries with Geo enabled. Every batch to update is called by a worker with limited capacity and limited to 1000 records per batch.
Update the state to pending in batches
Explain analyze: https://explain.depesz.com/s/T74d
UPDATE job_artifact_registry
SET
state = 0,
last_synced_at = NULL
WHERE
job_artifact_registry.id IN (
SELECT
job_artifact_registry.id
FROM
job_artifact_registry
WHERE
job_artifact_registry.state IN ( 0 ) AND
id > 0 AND
id < 10000
ORDER BY
job_artifact_registry.id ASC
LIMIT 1000
);
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Update on job_artifact_registry (cost=8.59..16.63 rows=1 width=750) (actual time=0.007..0.008 rows=0 loops=1)
-> Nested Loop (cost=8.59..16.63 rows=1 width=750) (actual time=0.006..0.007 rows=0 loops=1)
-> HashAggregate (cost=8.31..8.32 rows=1 width=32) (actual time=0.006..0.006 rows=0 loops=1)
Group Key: "ANY_subquery".id
Batches: 1 Memory Usage: 24kB
-> Subquery Scan on "ANY_subquery" (cost=0.28..8.31 rows=1 width=32) (actual time=0.004..0.004 rows=0 loops=1)
-> Limit (cost=0.28..8.30 rows=1 width=4) (actual time=0.003..0.003 rows=0 loops=1)
-> Index Scan using job_artifact_registry_pkey on job_artifact_registry job_artifact_registry_1 (cost=0.28..8.30 rows=1 width=4) (actual time=0.002..0.002 rows=0 loops=1)
Index Cond: ((id > 0) AND (id < 10000))
Filter: (state <> 0)
-> Index Scan using job_artifact_registry_pkey on job_artifact_registry (cost=0.28..8.30 rows=1 width=712) (never executed)
Index Cond: (id = "ANY_subquery".id)
Planning Time: 1.609 ms
Execution Time: 0.111 ms
(14 rows)
Time: 3.970 ms
Update the verification_state to pending in batches
Explain analyze: https://explain.depesz.com/s/ntoI
UPDATE job_artifact_registry
SET
verification_state = 0
WHERE
job_artifact_registry.id IN (
SELECT
job_artifact_registry.id
FROM
job_artifact_registry
WHERE
job_artifact_registry.state IN ( 2 ) AND
job_artifact_registry.verification_state <> 0 AND
id > 0 AND
id < 10000
ORDER BY
job_artifact_registry.id ASC
LIMIT 1000
);
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Update on job_artifact_registry (cost=8.60..16.63 rows=1 width=750) (actual time=0.009..0.010 rows=0 loops=1)
-> Nested Loop (cost=8.60..16.63 rows=1 width=750) (actual time=0.008..0.009 rows=0 loops=1)
-> HashAggregate (cost=8.32..8.33 rows=1 width=32) (actual time=0.008..0.008 rows=0 loops=1)
Group Key: "ANY_subquery".id
Batches: 1 Memory Usage: 24kB
-> Subquery Scan on "ANY_subquery" (cost=0.28..8.31 rows=1 width=32) (actual time=0.006..0.006 rows=0 loops=1)
-> Limit (cost=0.28..8.30 rows=1 width=4) (actual time=0.006..0.006 rows=0 loops=1)
-> Index Scan using job_artifact_registry_pkey on job_artifact_registry job_artifact_registry_1 (cost=0.28..8.30 rows=1 width=4) (actual time=0.004..0.004 rows=0 loops=1)
Index Cond: ((id > 0) AND (id < 10000))
Filter: ((verification_state <> 0) AND (state = 2))
-> Index Scan using job_artifact_registry_pkey on job_artifact_registry (cost=0.28..8.30 rows=1 width=720) (never executed)
Index Cond: (id = "ANY_subquery".id)
Planning Time: 0.332 ms
Execution Time: 0.069 ms
(14 rows)
Time: 0.854 ms
MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
I have evaluated the MR acceptance checklist for this MR.
Related to #417286 (closed)