Skip to content

Speed up zero-downtime reindexing with per-node parallel processing

What does this MR do and why?

Speed up zero-downtime reindexing with per-node parallel processing

This implements per-node event emission for Zoekt reindexing to enable parallel processing across nodes instead of global sequential processing.

Changes:

  • Add zoekt_node_id parameter to RepoToReindexEvent schema
  • Replace global event dispatch with per-node event emission in SchedulingService
  • Update RepoToReindexEventWorker to process only specific node repositories
  • Add Node.with_repositories_to_reindex scope to encapsulate database logic
  • Increase worker BATCH_SIZE to 1000 for better throughput
  • Add comprehensive test coverage for new node-scoped behavior

The new approach identifies nodes with repositories needing reindexing and emits one event per node, allowing workers to process repositories in parallel per node rather than globally, significantly reducing reindexing time during schema version updates.

References

Closes #563422 (closed)

Screenshots or screen recordings

This is a backend performance optimization with no UI changes.

Performance Impact

Before: Single global event processed sequentially across all repositories After: Parallel per-node events allowing concurrent processing

Expected improvements:

  • Reindexing time: Reduces from O(total_repositories) to O(max_repositories_per_node)
  • Resource utilization: Better CPU and I/O utilization across multiple Zoekt nodes
  • System responsiveness: Shorter blocking periods during schema updates

MR acceptance checklist

Evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.

Specific considerations for this MR:

  • Performance: Significantly improves reindexing performance through parallelization
  • Reliability: Maintains existing error handling while improving throughput
  • Testing: Comprehensive test coverage for new node-scoped behavior
  • Database: Added proper ActiveRecord scope to encapsulate database queries
  • Code quality: Resolved all rubocop violations
  • Backward compatibility: Worker gracefully handles both node-scoped and legacy global events

Merge request reports

Loading