Make Zoekt force re-indexing probability configurable

Summary

Currently, Zoekt uses random force re-indexing to prevent running out of mmap handlers (see #435765 (closed)). However, this causes significant work for very large projects. We should make the force re-indexing probability configurable to allow tuning and give flexibility to Self-Managed and Dedicated customers.

Problem

  • Random force re-indexing creates excessive work for very large repositories
  • Large repositories may actually end up being re-indexed more frequently than intended
  • The current fixed probability doesn't allow for quick tuning based on observed mmap usage
  • Self-Managed and Dedicated customers have no flexibility to adjust this behavior

Current Implementation

The force re-indexing probability is currently hardcoded as a constant in ee/app/services/search/zoekt/indexing_task_service.rb:

REINDEXING_CHANCE_PERCENTAGE = 0.5

The logic that uses this constant:

def random_force_reindexing?
  return true if task_type == :force_index_repo

  task_type == :index_repo && (rand * 100 <= REINDEXING_CHANCE_PERCENTAGE)
end

This means there's a 0.5% chance that any index_repo task will be converted to a force_index_repo task.

Proposal

  1. Convert the force re-indexing probability into a configurable setting - Add a new setting to ee/app/models/search/zoekt/settings.rb (similar to existing settings like zoekt_cpu_to_tasks_ratio). This allows quick tuning if needed and provides flexibility for SM/Dedicated customers.

  2. Consider making it based on the number of index files (similar to Zoekt library itself) instead of being purely random - though random is likely a good approximation already

  3. Monitor mmap usage carefully if we reduce the default probability

Files to Modify

  • ee/app/services/search/zoekt/indexing_task_service.rb - Replace constant with configurable setting
  • ee/app/models/search/zoekt/settings.rb - Add new setting definition
  • ee/app/models/ee/application_setting.rb - Add setting to jsonb_accessor :zoekt_settings
  • ee/app/validators/json_schemas/application_setting_zoekt_settings.json - Add schema for new setting
  • doc/integration/zoekt/_index.md - Document the new setting

Additional Context

  • The watermark implementation is now stable, which may reduce the need for aggressive force re-indexing
  • Related discussion: Slack thread in #f_code_search
Edited by 🤖 GitLab Bot 🤖