Tune tonic gRPC server HTTP/2 settings for high-concurrency load
## Summary
Load testing at 10K RPS (in-cluster, constant arrival rate) revealed that the tonic gRPC server uses entirely default HTTP/2 settings, which limits throughput under high concurrency. Connection resets occur when concurrent bidi streaming queries exceed tonic's default limits.
## Current State
`crates/gkg-server/src/grpc/server.rs:43-48`:
```rust
TonicServer::builder()
.layer(labkit::grpc::GrpcMetricsLayer::new())
.layer(labkit::grpc::GrpcTraceLayer::new())
.layer(labkit::grpc::GrpcCorrelationLayer::new())
.add_service(self.service)
.serve(self.addr)
```
No HTTP/2 or concurrency tuning applied. Tonic defaults include `max_concurrent_streams = 200` per connection, small window sizes, and no keepalive configuration.
## Observed Behavior
- At ~200 concurrent bidi gRPC streams per pod, new connections get reset (`connection reset by peer`)
- Webserver pods show no errors in logs and no OOM/restarts — the resets happen at the HTTP/2 layer
- Scaling from 3 → 10 webserver pods improved throughput from ~140 → ~400 RPS, confirming per-pod connection limits
- k8s Service (L4 kube-proxy) doesn't distribute gRPC streams evenly across pods since multiple streams multiplex over a single TCP connection
## Recommended Settings
```rust
TonicServer::builder()
.initial_connection_window_size(1024 * 1024) // 1MB (default 64KB)
.initial_stream_window_size(512 * 1024) // 512KB (default 64KB)
.concurrency_limit_per_connection(512) // allow more concurrent streams
.http2_keepalive_interval(Some(Duration::from_secs(10)))
.http2_keepalive_timeout(Some(Duration::from_secs(20)))
.tcp_keepalive(Some(Duration::from_secs(60)))
```
These should be configurable via the server config YAML rather than hardcoded.
## Additional Considerations
- gRPC + k8s Service is a known poor combination for load balancing bidi streams. Consider documenting headless service + client-side load balancing as a future optimization path.
- The `ExecuteQuery` RPC is a bidi stream (query → redaction exchange → result). Each stream holds a connection slot for the full query lifecycle, making concurrency limits more impactful than for unary RPCs.
## Load Test Results
| Webserver Pods | Actual RPS | Errors | Median Latency |
|---|---|---|---|
| 3 | ~140 | 6.9% (conn resets) | 2.9s |
| 10 | ~400 | 0.2% | 1.6s |
## Key Files
- `crates/gkg-server/src/grpc/server.rs` — tonic server builder (the change)
- `crates/gkg-server/src/config.rs` — server config (add grpc tuning fields)
issue