Zoekt: Cap max_file_match_results in standard search to reduce response size

Background

For standard (non-multi-match) search, Search::Zoekt::Params#max_file_match_results returns UNLIMITED (0). This means the coordinating node returns all matching files in its JSON response (up to ~5,000, bounded by the line match window), even though Rails only iterates through at most 10 pages × 20 results = 200 results before stopping (Cache::MAX_PAGES).

The 5,000 line match window (max_line_match_window) is intentional — it lets the coordinating node collect a large candidate set for relevance ranking. The problem is that the full ranked set is returned in the JSON response, even though Rails only needs the top N.

Note: Multi-match search also sends max_file_match_results = 5,000 (not per_page as originally assumed) — it has the same oversized payload problem.

Element count math

Each file + line match contributes ~52 JSON elements to the safe_parse counter (22 per file + 30 per line match). The 100k element limit is hit at:

Scenario Files Matches/file Total elements
Many files, 1 match ~1,900 1 ~100k
Moderate files, 5 matches ~580 5 ~100k
Fewer files, 10 matches ~310 10 ~100k

This is more aggressive than originally estimated (~20 elements per result was too low).

Proposal

Cap max_file_match_results to max(Cache::MAX_PAGES, current_page) * per_page behind a feature flag. This mirrors the existing Cache#page_limit logic and ensures Rails always requests exactly as many files as it will actually consume:

  • Page 1: max(10, 1) * 20 = 200 files
  • Page 5: max(10, 5) * 20 = 200 files (cache covers pages 1-10)
  • Page 15: max(10, 15) * 20 = 300 files
  • Page 50: max(10, 50) * 20 = 1,000 files

The coordinating node still collects and ranks across the full 5,000 window — the cap only affects the post-sort trim of the returned JSON. Deep pagination is preserved.

Implementation notes

Params doesn't currently have access to page or per_page — these need to be threaded through from SearchResults#zoekt_searchClient.searchParams. The page_limit value from Cache (which already computes [current_page, MAX_PAGES].max) is a natural fit.

Key files

  • ee/lib/search/zoekt/params.rbmax_file_match_results computation
  • ee/lib/search/zoekt/search_results.rbzoekt_search (needs to pass page info)
  • ee/lib/gitlab/search/zoekt/client.rbsearch (needs to forward page info to Params)
  • ee/lib/search/zoekt/cache.rbpage_limit (existing logic to reuse)

Rollout

This change should be rolled out behind a feature flag (e.g., zoekt_cap_file_match_results) to allow gradual rollout and quick rollback.

Note: This change requires #591911 (closed) (fix count fields) to land first — otherwise displayed counts would drop from "5,000+" to the cap value.

Edited by 🤖 GitLab Bot 🤖