Investigate tweaks to improve load balancing to reduce overloaded API workers
The upgrade of [GKE clusters to Google Dataplane V2](https://gitlab.com/groups/gitlab-com/gl-infra/-/epics/1606) in late August 2025 introduced a significant problem where API and Git transfers are prematurely terminated due to readiness probes flapping. You can see this in the Cloud Logging stats by searching for `Connection reset by peer` errors in NGINX Ingress Controller logs: https://cloudlogging.app.goo.gl/ibnWxBoYU7zuh5vg9
This issue with Cilium might be fixed upstream in https://github.com/cilium/cilium/pull/42170 and backported in v1.18.5, but we don't know yet when this will land in Dataplane V2.
One thing that might help is improving the load balancing. Our metrics show that the `puma_queued_connections` rises above 0 for a number of pods, and this could be indicative of the load balancer not directing traffic to free workers.
issue