Fix Zoekt indexing by cleaning up replicas without indices

What does this MR do and why?

Fix Zoekt indexing by cleaning up replicas without indices

This MR takes a simpler approach than the reverted MR !221451 (merged). Instead of trying to select namespaces with missing indices, we directly clean up ALL replicas without indices before processing. These replicas are broken and useless, so deleting them forces a replica count mismatch, which triggers the existing PlanningService flow to recreate them properly with indices.

Why the original fix (MR !221451 (merged)) didn't fully work:

The original MR successfully fixed SelectionService to select namespaces with missing indices (confirmed working on staging - 2304/2340 namespaces affected). However, indices were still not being created due to a deeper issue in the rollout flow:

  1. SelectionService - Correctly selected namespaces with replicas but no indices
  2. PlanningService - Only checked replicas.count == expected_replicas and returned :unchanged action, ignoring whether replicas actually have indices
  3. ProvisioningService - Only processes :create and :destroy actions, completely skips :unchanged

Result: Namespaces with correct replica count but 0 indices were selected but never processed.

Why this new approach is better:

Instead of trying to fix PlanningService's logic to detect replicas without indices, we take a simpler approach: delete all broken replicas upfront. This forces a replica count mismatch, which the existing PlanningService flow already handles correctly with :create actions.

All Scenarios Handled

Current State Desired After Cleanup PlanningService Action Result
1 replica, 0 indices 1 0 replicas :create Fixed
2 replicas, 1 with indices, 1 without 2 1 replica :create Fixed
3 replicas, 1 without indices 3 2 replicas :create Fixed
2 replicas, both with indices 2 2 replicas :unchanged OK

Key changes:

  1. Revert MR !221451 (merged) (complex selection logic no longer needed)
  2. Add without_indices scope to Replica model
  3. Add cleanup in RolloutService that runs before SelectionService
  4. Use batched deletion (each_batch) for efficiency and safety

Benefits:

  • Much simpler - one scope, one cleanup operation, reuses existing flow
  • More efficient - no nested iteration through namespaces
  • More comprehensive - cleans ALL broken replicas everywhere, not just selected batch
  • Self-healing - runs on every RolloutWorker execution
  • No changes to PlanningService/ProvisioningService needed

Database queries

New scope: Replica.without_indices

SELECT "zoekt_replicas".* 
FROM "zoekt_replicas" 
LEFT OUTER JOIN "zoekt_indices" 
  ON "zoekt_indices"."zoekt_replica_id" = "zoekt_replicas"."id" 
WHERE "zoekt_indices"."zoekt_replica_id" IS NULL

Query plan:

  • Uses existing foreign key index on zoekt_indices.zoekt_replica_id
  • LEFT JOIN is efficient for finding missing associations
  • Batched with each_batch(of: 1000) to limit transaction size

Cleanup operation

Search::Zoekt::Replica.without_indices.each_batch(of: 1000) do |batch|
  batch.delete_all
end

Performance characteristics:

  • Batch size: 1000 replicas per batch
  • On staging: ~2304 replicas to delete (3 batches)
  • Estimated time: < 5 seconds
  • Each batch is a separate transaction
  • No cascading deletes (indices don't exist)

References

How to set up and validate locally

  1. Create a namespace with a replica but no indices:
namespace = Group.first
enabled_ns = Search::Zoekt::EnabledNamespace.create!(
  namespace: namespace,
  root_namespace_id: namespace.id,
  number_of_replicas_override: 1
)

# Create replica without indices
Search::Zoekt::Replica.create!(
  zoekt_enabled_namespace: enabled_ns,
  namespace_id: namespace.id
)
  1. Verify the replica is selected by the scope:
Search::Zoekt::Replica.without_indices.count
# => Should be > 0
  1. Run RolloutService (dry_run to verify, then actual):
# Dry run (no changes)
Search::Zoekt::RolloutService.execute(dry_run: true)

# Actual run (will delete replicas without indices)
Search::Zoekt::RolloutService.execute(dry_run: false)
  1. Verify cleanup worked:
Search::Zoekt::Replica.without_indices.count
# => Should be 0

# Verify the namespace is now selected for recreation
pool = Search::Zoekt::SelectionService.execute
pool.enabled_namespaces.map(&:id).include?(enabled_ns.id)
# => true (because replica count is now 0, needs to create 1)

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Database Review

database requested for the new without_indices scope and batched deletion approach.

Edited by Dmitry Gruzd

Merge request reports

Loading