Skip to content

Introduce linear root method in UpdateAllMirrorsWorker

What does this MR do and why?

In this MR we're introducing the linear version of the roots scope for the UpdateAllMirrosWorker. This changes is behind the feature flag linear_mirrors_worker_roots.

How to setup and validate locally (strongly suggested)

  1. Enable the new method behavior feature flag

    Feature.enable(:linear_mirrors_worker_roots)
  2. In rails console enable the traversal id feature flag and the linear root version feature flag:

    Feature.enable(:use_traversal_ids_for_root_ancestor)
    Feature.enable(:use_traversal_ids)
  3. The setting should_check_namespace_plan has to be set to true so ::Gitlab::CurrentSettings.should_check_namespace_plan? will return true.

  4. Run the following command in the Rails console to trigger the query:

    UpdateAllMirrorsWorker.new.perform

SQL queries

The former SQL was:

SELECT "projects".*
FROM "projects"
INNER JOIN project_mirror_data import_state ON import_state.project_id = projects.id
INNER JOIN namespaces AS root_namespaces ON root_namespaces.id =
  (WITH RECURSIVE "base_and_ancestors" AS (
                                             (SELECT "namespaces".*
                                              FROM "namespaces"
                                              WHERE (id = projects.namespace_id))
                                           UNION
                                             (SELECT "namespaces".*
                                              FROM "namespaces",
                                                   "base_and_ancestors"
                                              WHERE "namespaces"."id" = "base_and_ancestors"."parent_id")) SELECT "id"
   FROM "base_and_ancestors" AS "namespaces"
   WHERE "namespaces"."parent_id" IS NULL)
LEFT JOIN gitlab_subscriptions ON gitlab_subscriptions.namespace_id = root_namespaces.id
LEFT JOIN plans ON plans.id = gitlab_subscriptions.hosted_plan_id
WHERE "projects"."archived" = FALSE
  AND "projects"."pending_delete" = FALSE
  AND "projects"."mirror" = TRUE
  AND "import_state"."status" NOT IN ('scheduled',
                                      'started')
  AND (import_state.next_execution_timestamp <= '2021-12-14 10:14:38.687048')
  AND (import_state.retry_count <= 14)
  AND (plans.name IN ('bronze',
                      'silver',
                      'premium',
                      'gold',
                      'ultimate',
                      'ultimate_trial',
                      'premium_trial')
       OR projects.visibility_level = 20)
ORDER BY import_state.next_execution_timestamp
LIMIT 60

This is the query plan for the query and the times with warm caches are:

Time: 245.320 ms
  - planning: 15.323 ms
  - execution: 229.997 ms
    - I/O read: 0.000 ms
    - I/O write: 0.000 ms

Shared buffers:
  - hits: 169248 (~1.30 GiB) from the buffer pool
  - reads: 0 from the OS file cache, including disk I/O
  - dirtied: 252 (~2.00 MiB)
  - writes: 0

The new SQL is:

SELECT "projects".*
FROM "projects"
INNER JOIN project_mirror_data import_state ON import_state.project_id = projects.id
INNER JOIN namespaces AS root_namespaces ON root_namespaces.id =
  (SELECT namespaces.traversal_ids[array_length(namespaces.traversal_ids, 1)] AS id
   FROM "namespaces"
   WHERE "namespaces"."id" IN
       (SELECT DISTINCT "namespaces".traversal_ids[1]
        FROM "namespaces"
        WHERE (id = projects.namespace_id)))
LEFT JOIN gitlab_subscriptions ON gitlab_subscriptions.namespace_id = root_namespaces.id
LEFT JOIN plans ON plans.id = gitlab_subscriptions.hosted_plan_id
WHERE "projects"."archived" = FALSE
  AND "projects"."pending_delete" = FALSE
  AND "projects"."mirror" = TRUE
  AND "import_state"."status" NOT IN ('scheduled',
                                      'started')
  AND (import_state.next_execution_timestamp <= '2021-12-14 10:22:18.636116')
  AND (import_state.retry_count <= 14)
  AND (plans.name IN ('bronze',
                      'silver',
                      'premium',
                      'gold',
                      'ultimate',
                      'ultimate_trial',
                      'premium_trial')
       OR projects.visibility_level = 20)
ORDER BY import_state.next_execution_timestamp
LIMIT 60

This is the query plan and the times with warm caches are:

Time: 230.102 ms
  - planning: 13.781 ms
  - execution: 216.321 ms
    - I/O read: 0.000 ms
    - I/O write: 0.000 ms

Shared buffers:
  - hits: 191935 (~1.50 GiB) from the buffer pool
  - reads: 0 from the OS file cache, including disk I/O
  - dirtied: 0
  - writes: 0

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Related to #339457 (closed)

Merge request reports