Fix N+1 queries in UpdateIndexUsedStorageBytesEventWorker (!224267) · Merge requests · GitLab.org / GitLab

What does this MR do and why?

Fixes N+1 queries in UpdateIndexUsedStorageBytesEventWorker that caused 2000+ second execution times (33+ minutes), violating Sidekiq queueing SLOs on the elasticsearch shard.

Related incident: gitlab-com/gl-infra/production#21344

Root cause

The worker processes up to 1000 indices per batch. For each index, it executed ~5 individual database queries:

SUM(size_bytes) from zoekt_repositories per index (1000x)
SELECT * from zoekt_nodes — lazy loaded per index (1000x)
SUM(reserved_storage_bytes) from zoekt_indices per node — called twice per index because Node#reserved_storage_bytes is not memoized (2000x)
UPDATE zoekt_indices per index (1000x)

Total: ~5000 queries per batch execution.

Fix

Replace per-index queries with batch operations, gated behind zoekt_batch_update_index_storage_bytes feature flag:

Eager load node associations via .preload_node
Batch fetch repository size sums with a single grouped query (Repository.sum_size_bytes_by_index)
Batch fetch node reserved bytes with a single grouped query (Index.sum_reserved_storage_bytes_by_node), then maintain correctness via running delta tracking after each index save!

The delta tracking is important: when multiple indices share the same node, updating one index's reserved_storage_bytes changes the node's unclaimed_storage_bytes. Without tracking the delta, subsequent indices on the same node would see stale values and could over-allocate storage.

Total after fix: ~4 read queries + N update queries (~1004 total).

Database queries

Query count summary

Query	Before	After
Fetch stale indices	1	1
Load nodes	1000 (lazy)	1 (eager)
SUM repos per index	1000	1 (grouped)
SUM reserved per node	2000	1 (grouped + cached)
UPDATE per index	1000	1000
Total	~5001	~1004

Queries for postgres.ai validation

These queries are self-contained and can be run directly in postgres.ai.

Query 1 — Fetch stale indices (unchanged, 1x):

SELECT "zoekt_indices".*
FROM "zoekt_indices"
WHERE (last_indexed_at >= used_storage_bytes_updated_at)
ORDER BY "zoekt_indices"."used_storage_bytes_updated_at" ASC
LIMIT 1000

https://postgres.ai/console/gitlab/gitlab-production-main/sessions/49026/commands/146589

Query 2 — Eager load nodes (new, 1x — replaces 1000 lazy loads):

SELECT "zoekt_nodes".*
FROM "zoekt_nodes"
WHERE "zoekt_nodes"."id" IN (
  SELECT DISTINCT zoekt_node_id FROM (
    SELECT zoekt_node_id FROM zoekt_indices
    WHERE last_indexed_at >= used_storage_bytes_updated_at
    ORDER BY used_storage_bytes_updated_at ASC
    LIMIT 1000
  ) sub
)

https://postgres.ai/console/gitlab/gitlab-production-main/sessions/49026/commands/146590

Query 3 — Batch SUM repository size_bytes (new, 1x — replaces 1000 per-index SUMs):

SELECT SUM("zoekt_repositories"."size_bytes") AS sum_size_bytes,
  "zoekt_repositories"."zoekt_index_id"
FROM "zoekt_repositories"
WHERE "zoekt_repositories"."zoekt_index_id" IN (
  SELECT id FROM zoekt_indices
  WHERE last_indexed_at >= used_storage_bytes_updated_at
  ORDER BY used_storage_bytes_updated_at ASC
  LIMIT 1000
)
GROUP BY "zoekt_repositories"."zoekt_index_id"

https://postgres.ai/console/gitlab/gitlab-production-main/sessions/49026/commands/146593

Query 4 — Batch SUM reserved_storage_bytes per node (new, 1x — replaces 2000 per-node SUMs):

SELECT SUM("zoekt_indices"."reserved_storage_bytes") AS sum_reserved_storage_bytes,
  "zoekt_indices"."zoekt_node_id"
FROM "zoekt_indices"
WHERE "zoekt_indices"."zoekt_node_id" IN (
  SELECT DISTINCT zoekt_node_id FROM (
    SELECT zoekt_node_id FROM zoekt_indices
    WHERE last_indexed_at >= used_storage_bytes_updated_at
    ORDER BY used_storage_bytes_updated_at ASC
    LIMIT 1000
  ) sub
)
GROUP BY "zoekt_indices"."zoekt_node_id"

https://postgres.ai/console/gitlab/gitlab-production-main/sessions/49026/commands/146594

Old query A — Per-index SUM repos (was called 1000x):

SELECT SUM("zoekt_repositories"."size_bytes")
FROM "zoekt_repositories"
WHERE "zoekt_repositories"."zoekt_index_id" = (
  SELECT id FROM zoekt_indices
  WHERE last_indexed_at >= used_storage_bytes_updated_at
  ORDER BY used_storage_bytes_updated_at ASC
  LIMIT 1
)

https://postgres.ai/console/gitlab/gitlab-production-main/sessions/49026/commands/146595

Old query B — Per-node SUM reserved (was called 2000x):

SELECT SUM("zoekt_indices"."reserved_storage_bytes")
FROM "zoekt_indices"
WHERE "zoekt_indices"."zoekt_node_id" = (
  SELECT zoekt_node_id FROM zoekt_indices
  WHERE last_indexed_at >= used_storage_bytes_updated_at
  ORDER BY used_storage_bytes_updated_at ASC
  LIMIT 1
)

https://postgres.ai/console/gitlab/gitlab-production-main/sessions/49026/commands/146596

Production benchmark (rails console, read-only)

Ran a read-only benchmark on the production rails console with 441 stale indices across 36 nodes:

Query	OLD	NEW	Improvement
SUM repos per index	38.41s (441×)	1.54s (1 batch)	25×
Load nodes	14.42s (441× lazy)	63.5ms (1 eager)	227×
SUM reserved per node	35.11s (882×)	172.7ms (1 batch)	203×
Total (read queries)	87.94s	1.78s	~49× faster

Note: this was with only 441 stale indices. At the full 1000-index batch size, the old path would exceed ~200s of read queries alone — confirming the root cause of the 2000+ second production incidents.

Production replica validation (postgres.ai)

Validated against production clone: 2,938 stale indices across 36 nodes, several M repos.

New batch queries:

Query	Execution Time	Plan
Fetch stale indices (LIMIT 1000)	1.0ms	Index Scan on `idx_zoekt_last_indexed_at_gt_used_storage_bytes_updated_at`
Eager load 36 nodes	1.4ms	Hash Join on 36-row `zoekt_nodes`
Batch SUM repos (1000 indices, 353K rows)	123ms	Index-Only Scan on `idx_zoekt_repositories_on_zoekt_index_id_and_size_bytes`
Batch SUM reserved (36 nodes, 59K rows)	46ms	Index Scan on `index_zoekt_indices_on_zoekt_node_id_and_id`

Old per-index queries (for comparison):

Query	Per-call time	Calls	Total estimated time
SUM repos per index	~0.15ms	1000	~150ms
Load node (lazy)	~0.5ms	1000	~500ms
SUM reserved per node	~3.8ms (warm) / ~255ms (cold)	2000	~7.6s (warm) / ~510s (cold)

References

Closes #590932
Related incident: gitlab-com/gl-infra/production#21344
Feature flag: zoekt_batch_update_index_storage_bytes
Rollout issue: #590966

How to set up and validate locally

Enable the feature flag:

Feature.enable(:zoekt_batch_update_index_storage_bytes)

Make some indices stale:

Search::Zoekt::Index.limit(5).update_all(
  used_storage_bytes_updated_at: 1.day.ago,
  last_indexed_at: Time.current
)

Trigger the worker and observe the query log:

ActiveRecord::Base.logger = Logger.new(STDOUT)
event = Search::Zoekt::UpdateIndexUsedStorageBytesEvent.new(data: {})
Search::Zoekt::UpdateIndexUsedStorageBytesEventWorker.new.perform(event.class.name, event.data)

Verify batch queries instead of per-index queries in the log.

MR acceptance checklist

Evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.

Edited Feb 20, 2026 by Dmitry Gruzd

Fix N+1 queries in UpdateIndexUsedStorageBytesEventWorker