Fix Zoekt indexing for namespaces with missing indices
What does this MR do and why?
Fix Zoekt indexing for namespaces with missing indices
The RolloutWorker was only selecting namespaces with mismatched replica counts, causing namespaces with correct replicas but missing indices to never be processed for indexing.
This fix updates SelectionService to also select namespaces with missing indices, ensuring all enabled namespaces eventually get their indices created.
Changes:
- Add
each_batch_with_mismatched_replicas_or_missing_indicesmethod - Update SelectionService to use the new selection method
- Add comprehensive test coverage for the new method
- Optimize performance by batching on base scope before filtering
Performance optimization: The method applies each_batch on the base scope first, then filters within each batch. This is more efficient than batching through already-filtered scopes with complex GROUP BY and HAVING clauses, as it allows the query optimizer to use simple primary key ordering for batching while only applying aggregation queries to small batches.
References
- Fixes #588267 (closed)
- Request for Help: https://gitlab.com/gitlab-com/request-for-help/-/issues/4126
- Zendesk Ticket: https://gitlab.zendesk.com/agent/tickets/689154
Implementation Details
Method: each_batch_with_mismatched_replicas_or_missing_indices
The method processes namespaces in batches of 5000 (configurable):
def self.each_batch_with_mismatched_replicas_or_missing_indices(batch_size: 5000)
processed_ids = Set.new
each_batch(of: batch_size) do |batch|
# Process namespaces with mismatched replicas in this batch
batch.with_mismatched_replicas.each do |ns|
processed_ids << ns.id
yield(ns)
end
# Process namespaces with missing indices in this batch (skip duplicates)
batch.with_missing_indices.each do |ns|
next if processed_ids.include?(ns.id)
yield(ns)
end
end
end
Why this approach?
- Performance: Batching happens on simple primary key ordering, not on aggregated results
-
Efficiency: Complex
GROUP BYqueries only run on small batches (5000 records) -
Deduplication: Uses
Setto track processed IDs and avoid duplicates - Consistency: Follows the same pattern as other batch methods in the codebase
Alternative approaches considered:
- Using
with_mismatched_replicas.or(with_missing_indices)scope: Failed due to query structure mismatch between the two scopes - Calling
each_batchon filtered scopes: Poor performance due to batching through aggregated results
SelectionService Changes
The SelectionService now calls the new method:
def fetch_enabled_namespace_for_indexing
[].tap do |batch|
::Search::Zoekt::EnabledNamespace
.with_rollout_allowed
.each_batch_with_mismatched_replicas_or_missing_indices do |ns|
batch << ns
break if batch.count >= max_batch_size
end
end
end
Database Impact
The query uses existing indexes on foreign keys and is limited by:
-
with_rollout_allowedscope (filters by rollout status) -
each_batchprocessing (default 5000 records per batch on base scope) - SelectionService
max_batch_size(default 128 namespaces)
No new indexes required. The performance characteristics are:
- Base scope batching: O(n/5000) iterations through primary key
- Per-batch filtering: Small aggregation queries on 5000 records max
- Memory usage: Minimal (one batch + Set of processed IDs)
How to set up and validate locally
-
In Rails console, create a namespace with replicas but no indices:
# Create an enabled namespace with replicas but no indices namespace = Group.first enabled_ns = Search::Zoekt::EnabledNamespace.create!( namespace: namespace, root_namespace_id: namespace.id ) # Create a replica to match expected count node = Search::Zoekt::Node.first Search::Zoekt::Replica.create!( zoekt_enabled_namespace: enabled_ns, zoekt_node: node ) -
Verify the namespace is selected by the new method:
# Count how many namespaces would be selected count = 0 Search::Zoekt::EnabledNamespace .with_rollout_allowed .each_batch_with_mismatched_replicas_or_missing_indices do |ns| count += 1 end count # => Should include enabled_ns -
Run SelectionService and verify it selects the namespace:
pool = Search::Zoekt::SelectionService.execute pool.enabled_namespaces.include?(enabled_ns) # => true -
Verify performance with a larger dataset:
# Create multiple namespaces with missing indices 10.times do |i| group = Group.create!(name: "test-group-#{i}", path: "test-group-#{i}") enabled_ns = Search::Zoekt::EnabledNamespace.create!( namespace: group, root_namespace_id: group.id ) Search::Zoekt::Replica.create!( zoekt_enabled_namespace: enabled_ns, zoekt_node: Search::Zoekt::Node.first ) end # Measure performance require 'benchmark' Benchmark.measure do Search::Zoekt::SelectionService.execute end
MR acceptance checklist
Evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.