Use PgBouncer with Gitaly Clusters
Problem to solve
We use PostgreSQL for routing reads and for write transactions when using a Gitaly Cluster. This means we could be making a few thousand requests per second on a busy GitLab instance. Pooling connections might be better at this scale.
Proposal
Add PgBouncer to the Gitaly Cluster setup guid
Problem to solve with connection pooling
Under high load Praefect tries to consume too many connections to Postgresql database, searching for pq:
in the praefect logs results into following errors, sorted by count:
399 accessor call: get synced: get shard for "storage-1": error looking up primary: pq: remaining connection slots are reserved for non-replication superuser connections
188 accessor call: get synced: get shard for "storage-1": error looking up primary: pq: sorry, too many clients already :: proc.c:347
68 query: pq: remaining connection slots are reserved for non-replication superuser connections
13 query: pq: sorry, too many clients already :: proc.c:347
13 pq: remaining connection slots are reserved for non-replication superuser connections
7 pq: sorry, too many clients already :: proc.c:347
2 rpc error: code = Unknown desc = accessor call: get synced: get shard for "storage-1": error looking up primary: pq: remaining connection slots are reserved for non-replication superuser connections
2 error looking up primary: pq: remaining connection slots are reserved for non-replication superuser connections
1 rpc error: code = Unknown desc = accessor call: get synced: get shard for "storage-1": error looking up primary: pq: sorry, too many clients already :: proc.c:347
1 error retrieving quorum count: pq: remaining connection slots are reserved for non-replication superuser connections
1 error looking up primary: pq: sorry, too many clients already :: proc.c:347
The full event:
{
"correlation_id": "Gp6jR8qct75",
"error": "accessor call: get synced: get shard for \"storage-1\": error looking up primary: pq: sorry, too many clients already :: proc.c:347",
"grpc.code": "Unknown",
"grpc.meta.auth_version": "v2",
"grpc.meta.client_name": "gitlab-web",
"grpc.meta.deadline_type": "regular",
"grpc.method": "TreeEntry",
"grpc.request.deadline": "2020-05-28T01:59:56Z",
"grpc.request.fullMethod": "/gitaly.CommitService/TreeEntry",
"grpc.service": "gitaly.CommitService",
"grpc.start_time": "2020-05-28T01:59:26Z",
"grpc.time_ms": 171.986,
"level": "error",
"msg": "finished streaming call with code Unknown",
"peer.address": "10.150.0.49:35294",
"pid": 31817,
"span.kind": "server",
"system": "grpc",
"time": "2020-05-28T01:59:26.876Z"
}
Edited by Pavlo Strokov