Fix zoekt reindexing logic to prevent excessive task creation
What does this MR do and why?
Fix zoekt reindexing logic to prevent excessive task creation
Previously, the zoekt reindexing system could create too many tasks due to mixed logic between initial indexing and reindexing workflows. This change separates the concerns by:
- Modifying
should_be_indexedscope to only returnpendingrepositories - Adding
should_be_reindexedscope for repositories with schema mismatches - Adding
with_pending_or_processing_tasksscope to check task status - Implementing RepoToReindexEventWorker with proper task limiting logic
- Adding comprehensive test coverage for new functionality
The new worker prevents task overflow by checking for existing pending/processing tasks before creating new reindexing tasks, while maintaining separation between initial indexing and reindexing workflows.
References
Related to zoekt search indexing performance and task management improvements.
Screenshots or screen recordings
How to set up and validate locally
- Set up zoekt search with repositories that need reindexing due to schema version mismatches
- Trigger the reindexing process via
RepoToReindexEvent - Verify that:
- Only repositories with schema version mismatches are processed
- No duplicate tasks are created when repositories already have pending/processing tasks
- The
should_be_indexedscope only returns pending repositories - The
should_be_reindexedscope correctly identifies repositories needing reindexing
- Run the test suite to validate all functionality:
bin/rspec ee/spec/workers/search/zoekt/repo_to_reindex_event_worker_spec.rb bin/rspec ee/spec/models/search/zoekt/repository_spec.rb
MR acceptance checklist
Evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.
Edited by Dmitry Gruzd