Fix integration propagation timeout on large group hierarchies
Problem
When you configure an integration (like Datadog or Jira) on a top-level group, GitLab propagates it to every project and subgroup underneath by looping through them in batches of 10,000.
To find each batch boundary, each_batch runs a query with expensive filters: a NOT EXISTS subquery checking whether the integration already exists, and a nested namespace subquery filtering out archived groups. PostgreSQL re-evaluates these filters on every row it scans.
For large group hierarchies (~1,364 subgroups, ~6,200 projects), this query takes over 15 seconds and times out. The integration never gets propagated.
Closes #591182 (closed)
Solution
Simplify the each_batch scope by removing the expensive filters. Instead:
- Pluck namespace IDs into a Ruby array first (fast, single query)
- Batch on a simple
WHERE namespace_id IN (ids)scope — cheap index scan - Let the worker do the detailed filtering (it already re-applies
without_integrationand other filters within eachmin_id..max_idrange)
The batch boundaries become cheap to find, and the timeout goes away.
Behind feature flag integration_propagation_simplified_batching.
Query plans
Tested against the gitlab-org namespace (ID 9970, ~9,000 subgroups, ~7,452 projects) on a production database clone via postgres.ai.
Previous Query
SELECT "projects"."id"
FROM "projects"
WHERE NOT EXISTS (
SELECT 1 FROM "integrations"
WHERE "integrations"."project_id" = "projects"."id"
AND "integrations"."type_new" = 'Integrations::JiraCloudApp'
)
AND "projects"."pending_delete" = FALSE
AND "projects"."archived" = FALSE
AND "projects"."namespace_id" IN (
SELECT "namespaces"."id"
FROM "namespaces"
WHERE "namespaces"."traversal_ids" @> ARRAY[9970]
AND NOT (
"namespaces"."traversal_ids"::bigint[] && ARRAY(
SELECT "namespace_settings"."namespace_id"
FROM "namespace_settings"
WHERE "namespace_settings"."archived" = TRUE
)::bigint[]
)
)
ORDER BY "projects"."id" ASC
LIMIT 1;
Limit (cost=749.98..2110.89 rows=1 width=4) (actual time=7606.060..7606.064 rows=1 loops=1)
Buffers: shared hit=25144 read=9924 dirtied=42
WAL: records=42 fpi=42 bytes=335194
InitPlan 1
-> Index Only Scan using index_namespace_settings_on_namespace_id_where_archived_true on public.namespace_settings (cost=0.29..748.28 rows=14205 width=4) (actual time=2.476..749.358 rows=15401 loops=1)
Heap Fetches: 1047
Buffers: shared hit=9895 read=1008
-> Nested Loop Anti Join (cost=1.70..8226660.38 rows=6045 width=4) (actual time=7606.058..7606.061 rows=1 loops=1)
Buffers: shared hit=25144 read=9924 dirtied=42
WAL: records=42 fpi=42 bytes=335194
-> Nested Loop (cost=1.14..8222975.49 rows=6315 width=4) (actual time=7605.506..7605.508 rows=1 loops=1)
Buffers: shared hit=25141 read=9923 dirtied=42
WAL: records=42 fpi=42 bytes=335194
-> Index Scan using index_projects_on_id_and_archived_and_pending_delete on public.projects (cost=0.56..6048052.70 rows=47841564 width=8) (actual time=4.031..4241.180 rows=6440 loops=1)
Buffers: shared hit=807 read=5382
-> Memoize (cost=0.58..0.70 rows=1 width=4) (actual time=0.520..0.520 rows=0 loops=6440)
Buffers: shared hit=24334 read=4541 dirtied=42
WAL: records=42 fpi=42 bytes=335194
-> Index Scan using namespaces_pkey on public.namespaces (cost=0.57..0.69 rows=1 width=4) (actual time=0.923..0.923 rows=0 loops=3587)
Index Cond: (namespaces.id = projects.namespace_id)
Filter: ((namespaces.traversal_ids @> '{9970}'::integer[]) AND (NOT ((namespaces.traversal_ids)::bigint[] && ((InitPlan 1).col1)::bigint[])))
Rows Removed by Filter: 1
Buffers: shared hit=24334 read=4541 dirtied=42
WAL: records=42 fpi=42 bytes=335194
-> Index Only Scan using index_integrations_on_project_id_and_type_new_unique on public.integrations (cost=0.56..0.58 rows=1 width=4) (actual time=0.548..0.548 rows=0 loops=1)
Index Cond: ((integrations.project_id = projects.id) AND (integrations.type_new = 'Integrations::JiraCloudApp'::text))
Heap Fetches: 0
Buffers: shared hit=3 read=1
Settings: random_page_cost = '1.5', work_mem = '230MB', seq_page_cost = '4', effective_cache_size = '472585MB', jit = 'off'
Copy
Statistics:
Time: 7.740 s
- planning: 130.439 ms
- execution: 7.610 s
- I/O read: N/A
- I/O write: N/A
Shared buffers:
- hits: 25144 (~196.40 MiB) from the buffer pool
- reads: 9924 (~77.50 MiB) from the OS file cache, including disk I/O
- dirtied: 42 (~336.00 KiB)
- writes: 0https://console.postgres.ai/gitlab/gitlab-production-main/sessions/51420/commands/152010
New Query
SELECT "projects"."id"
FROM "projects"
WHERE "projects"."namespace_id" IN (/* 9000+ project IDs */)
ORDER BY "projects"."id" ASC
LIMIT 1;
Limit (cost=22.77..44.71 rows=1 width=4) (actual time=454.267..454.268 rows=1 loops=1)
Buffers: shared hit=6222 read=463
-> Index Scan using projects_pkey on public.projects (cost=22.77..6437177.31 rows=293378 width=4) (actual time=454.265..454.265 rows=1 loops=1)
Filter: (projects.namespace_id = ANY ('{/* 9000+ project IDs */}'::integer[]))
Rows Removed by Filter: 6938
Buffers: shared hit=6222 read=463
Settings: effective_cache_size = '472585MB', jit = 'off', random_page_cost = '1.5', work_mem = '230MB', seq_page_cost = '4'
Copy
Statistics:
Time: 510.488 ms
- planning: 56.163 ms
- execution: 454.325 ms
- I/O read: N/A
- I/O write: N/A
Shared buffers:
- hits: 6222 (~48.60 MiB) from the buffer pool
- reads: 463 (~3.60 MiB) from the OS file cache, including disk I/O
- dirtied: 0
- writes: 0https://console.postgres.ai/gitlab/gitlab-production-main/sessions/51420/commands/152011
New Query with Offset
EXPLAIN SELECT "projects"."id"
FROM "projects"
WHERE "projects"."namespace_id" IN (/* ...all the IDs... */)
AND "projects"."id" >= /* project_id */
ORDER BY "projects"."id" ASC
LIMIT 1 OFFSET 5000;
Limit (cost=22886.72..22886.72 rows=1 width=4) (actual time=3112.383..3112.385 rows=0 loops=1)
Buffers: shared hit=16210 read=5405 dirtied=26
WAL: records=29 fpi=26 bytes=196748
-> Sort (cost=22874.22..23002.28 rows=51223 width=4) (actual time=3112.188..3112.315 rows=2091 loops=1)
Sort Key: projects.id
Sort Method: quicksort Memory: 97kB
Buffers: shared hit=16210 read=5405 dirtied=26
WAL: records=29 fpi=26 bytes=196748
-> Index Only Scan using index_projects_on_namespace_id_and_id on public.projects (cost=0.56..19470.96 rows=51223 width=4) (actual time=5.197..3111.511 rows=2091 loops=1)
Index Cond: ((projects.namespace_id = ANY ('{/* 9000+ project_ids */}'::integer[])) AND (projects.id >= 69738105))
Heap Fetches: 68
Buffers: shared hit=16207 read=5405 dirtied=26
WAL: records=29 fpi=26 bytes=196748
Settings: effective_cache_size = '472585MB', jit = 'off', random_page_cost = '1.5', work_mem = '230MB', seq_page_cost = '4'
Time: 60.098 ms
- planning: 44.002 ms
- execution: 16.096 ms
- I/O read: N/A
- I/O write: N/A
Shared buffers:
- hits: 21615 (~168.90 MiB) from the buffer pool
- reads: 0 from the OS file cache, including disk I/O
- dirtied: 0
- writes: 0https://console.postgres.ai/gitlab/gitlab-production-main/sessions/51537/commands/152254
Comparison
| OLD (subquery + filters) | NEW (simple IN list) | |
|---|---|---|
| Execution time | 7,606ms | 454ms |
| Buffer hits | 25,144 | 6,222 |
| Buffer reads | 9,924 | 463 |
| Join strategy | Nested Loop Anti Join (6,440 loops) | Index Scan with filter |
| Speedup | — | ~17x faster |