Merge Requests API causes Postgres connections exhaustion at high throughputs
Summary
Performance testing against our 50k environment recently found an issue with Postgres connections.
After upgrading to the Nightly Omnibus package 12.6.0-pre db9a0421981
one of the API performance tests, List project merge requests starting failing on some of it's connections:
* Environment: 50k
* Version: 12.7.0-pre `f347c4bd9a4`
* Option: 60s_1000rps
* Date: 2020-01-20
* Run Time: 43m 12.23s (Start: 04:28:43 UTC, End: 05:11:55 UTC)
NAME | RPS | RPS RESULT | RESPONSE P95 | REQUEST RESULTS | RESULT
---------------------------------------------------------|--------|----------------------|--------------|-----------------|-------
api_v4_projects_merge_requests | 1000/s | 904.67/s (>640.00/s) | 1130.69ms | 87.44% (>95%) | Failed
The error thrown was from PgBouncer about connection slots:
ERROR S: login failed: FATAL: remaining connection slots are reserved for non-replication superuser connections
Investigation was undertaken to verify if this was an environment issue or a bug. After it was suggested a new feature flag may solve the issue this was found to improve the situation a little but still ultimately failed (see logs below).
Steps to reproduce
This can only be reproduced by running our performance tests against the 50k environment. Contact myself, @niskhakova or @tpazitny to facilitate.
What is the current bug behavior?
Postgres connection slots are exhausted causing rejection errors.
What is the expected correct behavior?
Postgres connection slots aren't exhausted.
Relevant logs and/or screenshots
--------------------------------------------------------------------------------
GitLab: 12.9.0-pre (d85a6b95e3c) EE
GitLab Shell: 11.0.0
PostgreSQL: 10.12
--------------------------------------------------------------------------------
Loading production environment (Rails 6.0.2)
irb(main):001:0> Feature.enabled?(:async_merge_request_check_mergeability)
=> true
[...]
* Environment: 50k
* Environment Version: 12.9.0-pre `d85a6b95e3c`
* Option: 60s_1000rps
* Date: 2020-02-24
* Run Time: 55m 4.97s (Start: 14:29:33 UTC, End: 15:24:38 UTC)
* GPT Version: v1.2
█ Overall Results Score: 97.12%
NAME | RPS | RPS RESULT | TTFB AVG | TTFB P90 | REQ STATUS | RESULT
---------------------------------------------------------|--------|----------------------|-----------|----------------------|----------------|---------
api_v4_projects_merge_requests | 1000/s | 914.07/s (>640.00/s) | 977.67ms | 2141.16ms (<2000ms) | 93.42% (>95%) | FAILED¹²
Postgres and PGBouncer metrics:
Full Metrics Snapshot of test - https://snapshot.raintank.io/dashboard/snapshot/LMc6k81iHIZDSeVgv7QqRpoWKOuAitRj