Skip to content

Geo: New worker to mark registries pending to verify

What does this MR do and why?

This MR creates a new worker to support the reverification of all registries that belong to a particular data type in Geo to allow periodic sync jobs to process them.

How to set up and validate locally

  1. In the Rails console trigger the worker using the following (the selected data type is just an example):

    Geo::BulkMarkVerificationPendingBatchWorker.perform_with_capacity('Geo::LfsObjectRegistry')
  2. Check that all the registries have a verification_state equals to 0 (which means they are in a verification_pending state).

     Geo::LfsObjectRegistry.all.pluck(:verification_state)

    Note: If there are more than 1.000 registries then this change will be limited by the worker capacity.

  3. Look at the replicables admin UI and check that the Last verified text shows Unknown.

  4. Wait for one minute to see the Geo::VerificationBatchWorker starts verifying the registries.

Database

Queries

NOTE: EXPLAIN ANALYZE was run against a GDK instance with 2422 job_artifact_registries with Geo enabled. Every batch to update is called by a worker with limited capacity and limited to 1000 records per batch.

Update the state to pending in batches

Explain analyze: https://explain.depesz.com/s/T74d

UPDATE job_artifact_registry
SET
    state = 0,
    last_synced_at = NULL
WHERE
    job_artifact_registry.id IN (
        SELECT
            job_artifact_registry.id
        FROM
            job_artifact_registry
        WHERE
            job_artifact_registry.state IN ( 0 ) AND
            id > 0 AND
            id < 10000
        ORDER BY
            job_artifact_registry.id ASC
        LIMIT 1000
    );
                                                                                               QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Update on job_artifact_registry  (cost=8.59..16.63 rows=1 width=750) (actual time=0.007..0.008 rows=0 loops=1)
   ->  Nested Loop  (cost=8.59..16.63 rows=1 width=750) (actual time=0.006..0.007 rows=0 loops=1)
         ->  HashAggregate  (cost=8.31..8.32 rows=1 width=32) (actual time=0.006..0.006 rows=0 loops=1)
               Group Key: "ANY_subquery".id
               Batches: 1  Memory Usage: 24kB
               ->  Subquery Scan on "ANY_subquery"  (cost=0.28..8.31 rows=1 width=32) (actual time=0.004..0.004 rows=0 loops=1)
                     ->  Limit  (cost=0.28..8.30 rows=1 width=4) (actual time=0.003..0.003 rows=0 loops=1)
                           ->  Index Scan using job_artifact_registry_pkey on job_artifact_registry job_artifact_registry_1  (cost=0.28..8.30 rows=1 width=4) (actual time=0.002..0.002 rows=0 loops=1)
                                 Index Cond: ((id > 0) AND (id < 10000))
                                 Filter: (state <> 0)
         ->  Index Scan using job_artifact_registry_pkey on job_artifact_registry  (cost=0.28..8.30 rows=1 width=712) (never executed)
               Index Cond: (id = "ANY_subquery".id)
 Planning Time: 1.609 ms
 Execution Time: 0.111 ms
(14 rows)

Time: 3.970 ms

Update the verification_state to pending in batches

Explain analyze: https://explain.depesz.com/s/ntoI

UPDATE job_artifact_registry
SET
    verification_state = 0
WHERE
    job_artifact_registry.id IN (
        SELECT
            job_artifact_registry.id
        FROM
            job_artifact_registry
        WHERE
            job_artifact_registry.state IN ( 2 ) AND
            job_artifact_registry.verification_state <> 0 AND
            id > 0 AND
            id < 10000
        ORDER BY
            job_artifact_registry.id ASC
        LIMIT 1000
    );
                                                                                               QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Update on job_artifact_registry  (cost=8.60..16.63 rows=1 width=750) (actual time=0.009..0.010 rows=0 loops=1)
   ->  Nested Loop  (cost=8.60..16.63 rows=1 width=750) (actual time=0.008..0.009 rows=0 loops=1)
         ->  HashAggregate  (cost=8.32..8.33 rows=1 width=32) (actual time=0.008..0.008 rows=0 loops=1)
               Group Key: "ANY_subquery".id
               Batches: 1  Memory Usage: 24kB
               ->  Subquery Scan on "ANY_subquery"  (cost=0.28..8.31 rows=1 width=32) (actual time=0.006..0.006 rows=0 loops=1)
                     ->  Limit  (cost=0.28..8.30 rows=1 width=4) (actual time=0.006..0.006 rows=0 loops=1)
                           ->  Index Scan using job_artifact_registry_pkey on job_artifact_registry job_artifact_registry_1  (cost=0.28..8.30 rows=1 width=4) (actual time=0.004..0.004 rows=0 loops=1)
                                 Index Cond: ((id > 0) AND (id < 10000))
                                 Filter: ((verification_state <> 0) AND (state = 2))
         ->  Index Scan using job_artifact_registry_pkey on job_artifact_registry  (cost=0.28..8.30 rows=1 width=720) (never executed)
               Index Cond: (id = "ANY_subquery".id)
 Planning Time: 0.332 ms
 Execution Time: 0.069 ms
(14 rows)

Time: 0.854 ms

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Related to #417286 (closed)

Edited by Javiera Tapia

Merge request reports

Loading