Make Zoekt force re-indexing probability configurable
Summary
Currently, Zoekt uses random force re-indexing to prevent running out of mmap handlers (see #435765 (closed)). However, this causes significant work for very large projects. We should make the force re-indexing probability configurable to allow tuning and give flexibility to Self-Managed and Dedicated customers.
Problem
- Random force re-indexing creates excessive work for very large repositories
- Large repositories may actually end up being re-indexed more frequently than intended
- The current fixed probability doesn't allow for quick tuning based on observed mmap usage
- Self-Managed and Dedicated customers have no flexibility to adjust this behavior
Current Implementation
The force re-indexing probability is currently hardcoded as a constant in ee/app/services/search/zoekt/indexing_task_service.rb:
REINDEXING_CHANCE_PERCENTAGE = 0.5
The logic that uses this constant:
def random_force_reindexing?
return true if task_type == :force_index_repo
task_type == :index_repo && (rand * 100 <= REINDEXING_CHANCE_PERCENTAGE)
end
This means there's a 0.5% chance that any index_repo task will be converted to a force_index_repo task.
Proposal
-
Convert the force re-indexing probability into a configurable setting - Add a new setting to
ee/app/models/search/zoekt/settings.rb(similar to existing settings likezoekt_cpu_to_tasks_ratio). This allows quick tuning if needed and provides flexibility for SM/Dedicated customers. -
Consider making it based on the number of index files (similar to Zoekt library itself) instead of being purely random - though random is likely a good approximation already
-
Monitor mmap usage carefully if we reduce the default probability
Files to Modify
-
ee/app/services/search/zoekt/indexing_task_service.rb- Replace constant with configurable setting -
ee/app/models/search/zoekt/settings.rb- Add new setting definition -
ee/app/models/ee/application_setting.rb- Add setting tojsonb_accessor :zoekt_settings -
ee/app/validators/json_schemas/application_setting_zoekt_settings.json- Add schema for new setting -
doc/integration/zoekt/_index.md- Document the new setting
Additional Context
- The watermark implementation is now stable, which may reduce the need for aggressive force re-indexing
- Related discussion: Slack thread in #f_code_search