Fix integration propagation timeout on large group hierarchies

Problem

When you configure an integration (like Datadog or Jira) on a top-level group, GitLab propagates it to every project and subgroup underneath by looping through them in batches of 10,000.

To find each batch boundary, each_batch runs a query with expensive filters: a NOT EXISTS subquery checking whether the integration already exists, and a nested namespace subquery filtering out archived groups. PostgreSQL re-evaluates these filters on every row it scans.

For large group hierarchies (~1,364 subgroups, ~6,200 projects), this query takes over 15 seconds and times out. The integration never gets propagated.

Closes #591182 (closed)

Solution

Simplify the each_batch scope by removing the expensive filters. Instead:

  1. Pluck namespace IDs into a Ruby array first (fast, single query)
  2. Batch on a simple WHERE namespace_id IN (ids) scope — cheap index scan
  3. Let the worker do the detailed filtering (it already re-applies without_integration and other filters within each min_id..max_id range)

The batch boundaries become cheap to find, and the timeout goes away.

Behind feature flag integration_propagation_simplified_batching.

Query plans

Tested against the gitlab-org namespace (ID 9970, ~9,000 subgroups, ~7,452 projects) on a production database clone via postgres.ai.

Previous Query
SELECT "projects"."id"
FROM "projects"
WHERE NOT EXISTS (
    SELECT 1 FROM "integrations"
    WHERE "integrations"."project_id" = "projects"."id"
      AND "integrations"."type_new" = 'Integrations::JiraCloudApp'
  )
  AND "projects"."pending_delete" = FALSE
  AND "projects"."archived" = FALSE
  AND "projects"."namespace_id" IN (
    SELECT "namespaces"."id"
    FROM "namespaces"
    WHERE "namespaces"."traversal_ids" @> ARRAY[9970]
      AND NOT (
        "namespaces"."traversal_ids"::bigint[] && ARRAY(
          SELECT "namespace_settings"."namespace_id"
          FROM "namespace_settings"
          WHERE "namespace_settings"."archived" = TRUE
        )::bigint[]
      )
  )
ORDER BY "projects"."id" ASC
LIMIT 1;

Limit  (cost=749.98..2110.89 rows=1 width=4) (actual time=7606.060..7606.064 rows=1 loops=1)
   Buffers: shared hit=25144 read=9924 dirtied=42
   WAL: records=42 fpi=42 bytes=335194
   InitPlan 1
     ->  Index Only Scan using index_namespace_settings_on_namespace_id_where_archived_true on public.namespace_settings  (cost=0.29..748.28 rows=14205 width=4) (actual time=2.476..749.358 rows=15401 loops=1)
           Heap Fetches: 1047
           Buffers: shared hit=9895 read=1008
   ->  Nested Loop Anti Join  (cost=1.70..8226660.38 rows=6045 width=4) (actual time=7606.058..7606.061 rows=1 loops=1)
         Buffers: shared hit=25144 read=9924 dirtied=42
         WAL: records=42 fpi=42 bytes=335194
         ->  Nested Loop  (cost=1.14..8222975.49 rows=6315 width=4) (actual time=7605.506..7605.508 rows=1 loops=1)
               Buffers: shared hit=25141 read=9923 dirtied=42
               WAL: records=42 fpi=42 bytes=335194
               ->  Index Scan using index_projects_on_id_and_archived_and_pending_delete on public.projects  (cost=0.56..6048052.70 rows=47841564 width=8) (actual time=4.031..4241.180 rows=6440 loops=1)
                     Buffers: shared hit=807 read=5382
               ->  Memoize  (cost=0.58..0.70 rows=1 width=4) (actual time=0.520..0.520 rows=0 loops=6440)
                     Buffers: shared hit=24334 read=4541 dirtied=42
                     WAL: records=42 fpi=42 bytes=335194
                     ->  Index Scan using namespaces_pkey on public.namespaces  (cost=0.57..0.69 rows=1 width=4) (actual time=0.923..0.923 rows=0 loops=3587)
                           Index Cond: (namespaces.id = projects.namespace_id)
                           Filter: ((namespaces.traversal_ids @> '{9970}'::integer[]) AND (NOT ((namespaces.traversal_ids)::bigint[] && ((InitPlan 1).col1)::bigint[])))
                           Rows Removed by Filter: 1
                           Buffers: shared hit=24334 read=4541 dirtied=42
                           WAL: records=42 fpi=42 bytes=335194
         ->  Index Only Scan using index_integrations_on_project_id_and_type_new_unique on public.integrations  (cost=0.56..0.58 rows=1 width=4) (actual time=0.548..0.548 rows=0 loops=1)
               Index Cond: ((integrations.project_id = projects.id) AND (integrations.type_new = 'Integrations::JiraCloudApp'::text))
               Heap Fetches: 0
               Buffers: shared hit=3 read=1
Settings: random_page_cost = '1.5', work_mem = '230MB', seq_page_cost = '4', effective_cache_size = '472585MB', jit = 'off'
Copy
Statistics:
Time: 7.740 s
  - planning: 130.439 ms
  - execution: 7.610 s
    - I/O read: N/A
    - I/O write: N/A

Shared buffers:
  - hits: 25144 (~196.40 MiB) from the buffer pool
  - reads: 9924 (~77.50 MiB) from the OS file cache, including disk I/O
  - dirtied: 42 (~336.00 KiB)
  - writes: 0

https://console.postgres.ai/gitlab/gitlab-production-main/sessions/51420/commands/152010

New Query
SELECT "projects"."id"
FROM "projects"
WHERE "projects"."namespace_id" IN (/* 9000+ project IDs */)
ORDER BY "projects"."id" ASC
LIMIT 1;

Limit  (cost=22.77..44.71 rows=1 width=4) (actual time=454.267..454.268 rows=1 loops=1)
   Buffers: shared hit=6222 read=463
   ->  Index Scan using projects_pkey on public.projects  (cost=22.77..6437177.31 rows=293378 width=4) (actual time=454.265..454.265 rows=1 loops=1)
         Filter: (projects.namespace_id = ANY ('{/* 9000+ project IDs */}'::integer[]))
         Rows Removed by Filter: 6938
         Buffers: shared hit=6222 read=463
Settings: effective_cache_size = '472585MB', jit = 'off', random_page_cost = '1.5', work_mem = '230MB', seq_page_cost = '4'
Copy
Statistics:
Time: 510.488 ms
  - planning: 56.163 ms
  - execution: 454.325 ms
    - I/O read: N/A
    - I/O write: N/A

Shared buffers:
  - hits: 6222 (~48.60 MiB) from the buffer pool
  - reads: 463 (~3.60 MiB) from the OS file cache, including disk I/O
  - dirtied: 0
  - writes: 0

https://console.postgres.ai/gitlab/gitlab-production-main/sessions/51420/commands/152011

New Query with Offset
EXPLAIN SELECT "projects"."id"
FROM "projects"
WHERE "projects"."namespace_id" IN (/* ...all the IDs... */)
  AND "projects"."id" >= /* project_id */
ORDER BY "projects"."id" ASC
LIMIT 1 OFFSET 5000;

 Limit  (cost=22886.72..22886.72 rows=1 width=4) (actual time=3112.383..3112.385 rows=0 loops=1)
   Buffers: shared hit=16210 read=5405 dirtied=26
   WAL: records=29 fpi=26 bytes=196748
   ->  Sort  (cost=22874.22..23002.28 rows=51223 width=4) (actual time=3112.188..3112.315 rows=2091 loops=1)
         Sort Key: projects.id
         Sort Method: quicksort  Memory: 97kB
         Buffers: shared hit=16210 read=5405 dirtied=26
         WAL: records=29 fpi=26 bytes=196748
         ->  Index Only Scan using index_projects_on_namespace_id_and_id on public.projects  (cost=0.56..19470.96 rows=51223 width=4) (actual time=5.197..3111.511 rows=2091 loops=1)
               Index Cond: ((projects.namespace_id = ANY ('{/* 9000+ project_ids */}'::integer[])) AND (projects.id >= 69738105))
               Heap Fetches: 68
               Buffers: shared hit=16207 read=5405 dirtied=26
               WAL: records=29 fpi=26 bytes=196748
Settings: effective_cache_size = '472585MB', jit = 'off', random_page_cost = '1.5', work_mem = '230MB', seq_page_cost = '4'

Time: 60.098 ms
  - planning: 44.002 ms
  - execution: 16.096 ms
    - I/O read: N/A
    - I/O write: N/A

Shared buffers:
  - hits: 21615 (~168.90 MiB) from the buffer pool
  - reads: 0 from the OS file cache, including disk I/O
  - dirtied: 0
  - writes: 0

https://console.postgres.ai/gitlab/gitlab-production-main/sessions/51537/commands/152254

Comparison

OLD (subquery + filters) NEW (simple IN list)
Execution time 7,606ms 454ms
Buffer hits 25,144 6,222
Buffer reads 9,924 463
Join strategy Nested Loop Anti Join (6,440 loops) Index Scan with filter
Speedup ~17x faster

Feature flag rollout issue

#599108

Edited by Carla Drago

Merge request reports

Loading