Skip to content

Set MaxConnectionAge* configuration for Praefect server

Quang-Minh Nguyen requested to merge qmnguyen0711/add-client-loadbalancing into master

Praefects are deployed behind a TCP load balancer. The TCP load balancer routes an incoming TCP connection to a random Praefect. The requests sent from the client through that TCP connection then always land on the same Praefect. This is an issue for example with Workhorse which keeps open a persistent gRPC connection to a Praefect and reuses it for all requests. When a new Praefect is added, Workhorse is never aware of this new node. Incoming requests keep routing to the existing node.

We also consider adding the support for DNS service discovery. The built-in DNS resolver is triggers when the client starts or when the connectivity state changes. As a result, this method and TCP load balancer faces the same stickiness problem.

This commit sets MaxConnectionAge to force clients to re-connect after a certain duration (15 mins in this version). This attempts to balance the workload better. Client connection switching is graceful. In-flight requests continue on the established connection until finish. New requests are routed to the new one. After MaxConnectionAgeGrace duration the prior connection is forced to close.

MaxConnectionAge is set to 15 minutes. While we can go lower than this number, it's a good start. With this value, each client needs to reconnect 4 times per hour. Every reconnection adds from some hundreds of milliseconds to some seconds to the latency. I think this overhead is reasonable.

MaxConnectionAgeGrace is the maximum grace period before the detached connection is closed. Based on production data, it should be more than the maximum gPRC latency. We don't use Praefect that widely on production. So, I take Gitaly latency to account. The 5-minute duration is quite generous for most use cases.

Percentile Gitaly Praefect
50th 4.284ms 6.227.ms
95th 138.235ms 15.098ms
99th 643.921ms 164.093ms
99.99th 8,901.715ms 49,914.675ms

Source: Gitaly logs, Praefect logs

Edited by Quang-Minh Nguyen

Merge request reports