Fix Zoekt indexing by cleaning up replicas without indices
What does this MR do and why?
Fix Zoekt indexing by cleaning up replicas without indices
This MR takes a simpler approach than the reverted MR !221451 (merged). Instead of trying to select namespaces with missing indices, we directly clean up ALL replicas without indices before processing. These replicas are broken and useless, so deleting them forces a replica count mismatch, which triggers the existing PlanningService flow to recreate them properly with indices.
Why the original fix (MR !221451 (merged)) didn't fully work:
The original MR successfully fixed SelectionService to select namespaces with missing indices (confirmed working on staging - 2304/2340 namespaces affected). However, indices were still not being created due to a deeper issue in the rollout flow:
-
SelectionService
✅ - Correctly selected namespaces with replicas but no indices -
PlanningService
❌ - Only checkedreplicas.count == expected_replicasand returned:unchangedaction, ignoring whether replicas actually have indices -
ProvisioningService
❌ - Only processes:createand:destroyactions, completely skips:unchanged
Result: Namespaces with correct replica count but 0 indices were selected but never processed.
Why this new approach is better:
Instead of trying to fix PlanningService's logic to detect replicas without indices, we take a simpler approach: delete all broken replicas upfront. This forces a replica count mismatch, which the existing PlanningService flow already handles correctly with :create actions.
All Scenarios Handled
| Current State | Desired | After Cleanup | PlanningService Action | Result |
|---|---|---|---|---|
| 1 replica, 0 indices | 1 | 0 replicas | :create |
|
| 2 replicas, 1 with indices, 1 without | 2 | 1 replica | :create |
|
| 3 replicas, 1 without indices | 3 | 2 replicas | :create |
|
| 2 replicas, both with indices | 2 | 2 replicas | :unchanged |
|
Key changes:
- Revert MR !221451 (merged) (complex selection logic no longer needed)
- Add
without_indicesscope to Replica model - Add cleanup in RolloutService that runs before SelectionService
- Use batched deletion (
each_batch) for efficiency and safety
Benefits:
- Much simpler - one scope, one cleanup operation, reuses existing flow
- More efficient - no nested iteration through namespaces
- More comprehensive - cleans ALL broken replicas everywhere, not just selected batch
- Self-healing - runs on every RolloutWorker execution
- No changes to PlanningService/ProvisioningService needed
Database queries
New scope: Replica.without_indices
SELECT "zoekt_replicas".*
FROM "zoekt_replicas"
LEFT OUTER JOIN "zoekt_indices"
ON "zoekt_indices"."zoekt_replica_id" = "zoekt_replicas"."id"
WHERE "zoekt_indices"."zoekt_replica_id" IS NULL
Query plan:
- Uses existing foreign key index on
zoekt_indices.zoekt_replica_id - LEFT JOIN is efficient for finding missing associations
- Batched with
each_batch(of: 1000)to limit transaction size
Cleanup operation
Search::Zoekt::Replica.without_indices.each_batch(of: 1000) do |batch|
batch.delete_all
end
Performance characteristics:
- Batch size: 1000 replicas per batch
- On staging: ~2304 replicas to delete (3 batches)
- Estimated time: < 5 seconds
- Each batch is a separate transaction
- No cascading deletes (indices don't exist)
References
- Fixes #588267 (closed)
- Reverts !221451 (merged)
- Request for Help: https://gitlab.com/gitlab-com/request-for-help/-/issues/4126
How to set up and validate locally
- Create a namespace with a replica but no indices:
namespace = Group.first
enabled_ns = Search::Zoekt::EnabledNamespace.create!(
namespace: namespace,
root_namespace_id: namespace.id,
number_of_replicas_override: 1
)
# Create replica without indices
Search::Zoekt::Replica.create!(
zoekt_enabled_namespace: enabled_ns,
namespace_id: namespace.id
)
- Verify the replica is selected by the scope:
Search::Zoekt::Replica.without_indices.count
# => Should be > 0
- Run RolloutService (dry_run to verify, then actual):
# Dry run (no changes)
Search::Zoekt::RolloutService.execute(dry_run: true)
# Actual run (will delete replicas without indices)
Search::Zoekt::RolloutService.execute(dry_run: false)
- Verify cleanup worked:
Search::Zoekt::Replica.without_indices.count
# => Should be 0
# Verify the namespace is now selected for recreation
pool = Search::Zoekt::SelectionService.execute
pool.enabled_namespaces.map(&:id).include?(enabled_ns.id)
# => true (because replica count is now 0, needs to create 1)
MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
- I have evaluated the MR acceptance checklist for this MR.
Database Review
database requested for the new without_indices scope and batched deletion approach.