Skip to content

Check for plans in update all mirrors worker

What does this MR do?

When we disable the free_period_for_pull_mirroring feature flag, we try to process a lot of mirrors on free private projects, that can no longer be processed. If we do the check in Ruby, we put a lot of strain on the worker, and make things worse for paying and public projects.

We could mark the mirrors as failed if they don't have a license, or not as mirrors at all, but then you'd need to re-enable your mirror after starting a subscription, which isn't ideal.

This change makes the query only select projects that are either public or on paid plans. This prevents us from needing to iterate through a bunch of mirrors that were set up on free private projects when pull mirrors were free, and still allows those projects to just work if they do start using a paid plan.

Query plans

Old:

gitlabhq_production=> EXPLAIN ANALYZE SELECT "projects".* FROM "projects" INNER JOIN project_mirror_data import_state ON import_state.project_id = projects.id WHERE "projects"."archived" = FALSE AND "projects"."mirror" = TRUE AND "import_state"."status" NOT IN ('scheduled', 'started') AND (import_state.next_execution_timestamp <= '2020-03-23 13:43:55.245995') AND (import_state.retry_count <= 14) ORDER BY import_state.next_execution_timestamp LIMIT 500;
                                                                                                   QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.99..144891.78 rows=500 width=761) (actual time=167.322..185.315 rows=4 loops=1)
   ->  Nested Loop  (cost=0.99..357301.67 rows=1233 width=761) (actual time=167.321..185.313 rows=4 loops=1)
         ->  Index Scan using index_mirror_data_on_next_execution_and_retry_count on project_mirror_data import_state  (cost=0.56..102274.28 rows=95083 width=12) (actual time=0.498..81.842 rows=19303 loops=1)
               Index Cond: ((next_execution_timestamp <= '2020-03-23 13:43:55.245995+00'::timestamp with time zone) AND (retry_count <= 14))
               Filter: ((status)::text <> ALL ('{scheduled,started}'::text[]))
               Rows Removed by Filter: 180
         ->  Index Scan using projects_pkey on projects  (cost=0.43..2.67 rows=1 width=753) (actual time=0.005..0.005 rows=0 loops=19303)
               Index Cond: (id = import_state.project_id)
               Filter: ((NOT archived) AND mirror)
               Rows Removed by Filter: 1
 Planning time: 0.709 ms
 Execution time: 185.500 ms
(12 rows)

New:

gitlabhq_production=> EXPLAIN ANALYZE SELECT "projects".* FROM "projects" INNER JOIN project_mirror_data import_state ON import_state.project_id = projects.id INNER JOIN namespaces AS root_namespaces ON root_namespaces.id = (WITH RECURSIVE "base_and_ancestors" AS ((SELECT "namespaces".* FROM "namespaces" WHERE (id = projects.namespace_id))
gitlabhq_production(> UNION
gitlabhq_production(> (SELECT "namespaces".* FROM "namespaces", "base_and_ancestors" WHERE "namespaces"."id" = "base_and_ancestors"."parent_id")) SELECT "id" FROM "base_and_ancestors" AS "namespaces" WHERE "namespaces"."parent_id" IS NULL) LEFT JOIN gitlab_subscriptions ON gitlab_subscriptions.namespace_id = root_namespaces.id LEFT JOIN plans ON plans.id = gitlab_subscriptions.hosted_plan_id WHERE "projects"."archived" = FALSE AND "projects"."mirror" = TRUE AND "import_state"."status" NOT IN ('scheduled', 'started') AND (import_state.next_execution_timestamp <= '2020-03-23 13:44:00.840198') AND (import_state.retry_count <= 14) AND (plans.name IN ('bronze','silver','gold','early_adopter') OR projects.visibility_level = 20) ORDER BY import_state.next_execution_timestamp LIMIT 500;
                                                                                                            QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=357.34..455398.41 rows=500 width=761) (actual time=158.396..176.673 rows=2 loops=1)
   ->  Nested Loop Left Join  (cost=357.34..800319.53 rows=879 width=761) (actual time=158.395..176.671 rows=2 loops=1)
         Join Filter: (plans.id = gitlab_subscriptions.hosted_plan_id)
         Rows Removed by Join Filter: 30
         Filter: (((plans.name)::text = ANY ('{bronze,silver,gold,early_adopter}'::text[])) OR (projects.visibility_level = 20))
         Rows Removed by Filter: 4
         ->  Nested Loop Left Join  (cost=357.34..800148.33 rows=1238 width=765) (actual time=158.228..176.613 rows=6 loops=1)
               ->  Nested Loop  (cost=356.91..799566.69 rows=1238 width=765) (actual time=158.212..176.539 rows=6 loops=1)
                     ->  Nested Loop  (cost=0.99..358474.72 rows=1238 width=761) (actual time=158.127..176.354 rows=6 loops=1)
                           ->  Index Scan using index_mirror_data_on_next_execution_and_retry_count on project_mirror_data import_state  (cost=0.56..102625.89 rows=95458 width=12) (actual time=0.503..76.648 rows=19305 loops=1)
                                 Index Cond: ((next_execution_timestamp <= '2020-03-23 13:44:00.840198+00'::timestamp with time zone) AND (retry_count <= 14))
                                 Filter: ((status)::text <> ALL ('{scheduled,started}'::text[]))
                                 Rows Removed by Filter: 174
                           ->  Index Scan using projects_pkey on projects  (cost=0.43..2.67 rows=1 width=753) (actual time=0.005..0.005 rows=0 loops=19305)
                                 Index Cond: (id = import_state.project_id)
                                 Filter: ((NOT archived) AND mirror)
                                 Rows Removed by Filter: 1
                     ->  Index Only Scan using namespaces_pkey on namespaces root_namespaces  (cost=355.92..356.28 rows=1 width=4) (actual time=0.003..0.003 rows=1 loops=6)
                           Index Cond: (id = (SubPlan 2))
                           Heap Fetches: 0
                           SubPlan 2
                             ->  CTE Scan on base_and_ancestors namespaces_2  (cost=353.47..355.49 rows=1 width=4) (actual time=0.023..0.025 rows=1 loops=6)
                                   Filter: (parent_id IS NULL)
                                   CTE base_and_ancestors
                                     ->  Recursive Union  (cost=0.43..353.47 rows=101 width=326) (actual time=0.019..0.021 rows=1 loops=6)
                                           ->  Index Scan using namespaces_pkey on namespaces  (cost=0.43..3.45 rows=1 width=326) (actual time=0.010..0.010 rows=1 loops=6)
                                                 Index Cond: (id = projects.namespace_id)
                                           ->  Nested Loop  (cost=0.43..34.80 rows=10 width=326) (actual time=0.001..0.002 rows=0 loops=6)
                                                 ->  WorkTable Scan on base_and_ancestors  (cost=0.00..0.20 rows=10 width=4) (actual time=0.001..0.001 rows=1 loops=6)
                                                 ->  Index Scan using namespaces_pkey on namespaces namespaces_1  (cost=0.43..3.45 rows=1 width=326) (actual time=0.000..0.001 rows=0 loops=6)
                                                       Index Cond: (id = base_and_ancestors.parent_id)
               ->  Index Scan using index_gitlab_subscriptions_on_namespace_id on gitlab_subscriptions  (cost=0.43..0.46 rows=1 width=8) (actual time=0.011..0.011 rows=1 loops=6)
                     Index Cond: (namespace_id = root_namespaces.id)
         ->  Materialize  (cost=0.00..4.09 rows=6 width=11) (actual time=0.003..0.005 rows=6 loops=6)
               ->  Seq Scan on plans  (cost=0.00..4.06 rows=6 width=11) (actual time=0.009..0.009 rows=6 loops=1)
 Planning time: 9.320 ms
 Execution time: 177.614 ms
(37 rows)

Note that these can change depending on the mirrors scheduled, but it looks pretty good so far.

For #212074 (closed).

Edited by Sean McGivern

Merge request reports