Stop grpc streams when thresholds are met
What does this MR do and why?
This MR fixes gRPC message size errors by enforcing result limits at the stream level, preventing individual Zoekt nodes from sending responses that exceed gRPC message size limits.
Problem
Search requests were failing with too large messages, ie: grpc: received message larger than max (26588986 vs. 16777216)
.
The issue occurred because limits (max_line_match_results
, max_line_match_results_per_file
) were only enforced in multiNodeSearch()
after receiving and aggregating results from all streams. A single Zoekt node could accumulate 25,000+ line matches and send a 26MB+ gRPC message to the webserver before any limits were applied, exceeding the default 16MB gRPC limit.
Solution
Enforce max_line_match_window
and max_line_match_results_per_file
while receiving the gRPC stream in handleGrpcSearchStream()
:
- Truncate each file to
max_line_match_results_per_file
matches as files are processed - Stop reading the stream immediately when
max_line_match_window
line matches are collected - Uses
proto.Clone()
to safely truncate protobuf messages
This prevents any single stream from exceeding gRPC message limits, while maintaining correct behavior: final sorting by relevance score still occurs after combining results from all nodes.
Example: With max_line_match_window: 5000
, each stream now stops at 5000 line matches instead of potentially sending 25,000+, keeping messages well under the gRPC limit.
How to set up and validate locally
Turn off gdk webservers and turn on webservers with these changes in different terminal panes.
In gdk directory:
gdk stop gitlab-zoekt-webserver-development-1 gitlab-zoekt-webserver-development-2
In this repo's directory:
make gdk-webserver-1
make gdk-webserver-2