Tune tonic gRPC server HTTP/2 settings for high-concurrency load
## Summary Load testing at 10K RPS (in-cluster, constant arrival rate) revealed that the tonic gRPC server uses entirely default HTTP/2 settings, which limits throughput under high concurrency. Connection resets occur when concurrent bidi streaming queries exceed tonic's default limits. ## Current State `crates/gkg-server/src/grpc/server.rs:43-48`: ```rust TonicServer::builder() .layer(labkit::grpc::GrpcMetricsLayer::new()) .layer(labkit::grpc::GrpcTraceLayer::new()) .layer(labkit::grpc::GrpcCorrelationLayer::new()) .add_service(self.service) .serve(self.addr) ``` No HTTP/2 or concurrency tuning applied. Tonic defaults include `max_concurrent_streams = 200` per connection, small window sizes, and no keepalive configuration. ## Observed Behavior - At ~200 concurrent bidi gRPC streams per pod, new connections get reset (`connection reset by peer`) - Webserver pods show no errors in logs and no OOM/restarts — the resets happen at the HTTP/2 layer - Scaling from 3 → 10 webserver pods improved throughput from ~140 → ~400 RPS, confirming per-pod connection limits - k8s Service (L4 kube-proxy) doesn't distribute gRPC streams evenly across pods since multiple streams multiplex over a single TCP connection ## Recommended Settings ```rust TonicServer::builder() .initial_connection_window_size(1024 * 1024) // 1MB (default 64KB) .initial_stream_window_size(512 * 1024) // 512KB (default 64KB) .concurrency_limit_per_connection(512) // allow more concurrent streams .http2_keepalive_interval(Some(Duration::from_secs(10))) .http2_keepalive_timeout(Some(Duration::from_secs(20))) .tcp_keepalive(Some(Duration::from_secs(60))) ``` These should be configurable via the server config YAML rather than hardcoded. ## Additional Considerations - gRPC + k8s Service is a known poor combination for load balancing bidi streams. Consider documenting headless service + client-side load balancing as a future optimization path. - The `ExecuteQuery` RPC is a bidi stream (query → redaction exchange → result). Each stream holds a connection slot for the full query lifecycle, making concurrency limits more impactful than for unary RPCs. ## Load Test Results | Webserver Pods | Actual RPS | Errors | Median Latency | |---|---|---|---| | 3 | ~140 | 6.9% (conn resets) | 2.9s | | 10 | ~400 | 0.2% | 1.6s | ## Key Files - `crates/gkg-server/src/grpc/server.rs` — tonic server builder (the change) - `crates/gkg-server/src/config.rs` — server config (add grpc tuning fields)
issue