Fix zoekt reindexing logic to prevent excessive task creation

What does this MR do and why?

Fix zoekt reindexing logic to prevent excessive task creation

Previously, the zoekt reindexing system could create too many tasks due to mixed logic between initial indexing and reindexing workflows. This change separates the concerns by:

  • Modifying should_be_indexed scope to only return pending repositories
  • Adding should_be_reindexed scope for repositories with schema mismatches
  • Adding with_pending_or_processing_tasks scope to check task status
  • Implementing RepoToReindexEventWorker with proper task limiting logic
  • Adding comprehensive test coverage for new functionality

The new worker prevents task overflow by checking for existing pending/processing tasks before creating new reindexing tasks, while maintaining separation between initial indexing and reindexing workflows.

References

Related to zoekt search indexing performance and task management improvements.

Screenshots or screen recordings

How to set up and validate locally

  1. Set up zoekt search with repositories that need reindexing due to schema version mismatches
  2. Trigger the reindexing process via RepoToReindexEvent
  3. Verify that:
    • Only repositories with schema version mismatches are processed
    • No duplicate tasks are created when repositories already have pending/processing tasks
    • The should_be_indexed scope only returns pending repositories
    • The should_be_reindexed scope correctly identifies repositories needing reindexing
  4. Run the test suite to validate all functionality:
    bin/rspec ee/spec/workers/search/zoekt/repo_to_reindex_event_worker_spec.rb
    bin/rspec ee/spec/models/search/zoekt/repository_spec.rb

MR acceptance checklist

Evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by Dmitry Gruzd

Merge request reports

Loading