Validate a behaviour of custom `LoadBalancing` when using Puma
We had an outage (unrelated to Puma), but showing some Sentry errors: gitlab-com/gl-infra/production#1327 (comment 240852524).
could not obtain a connection from the pool within 5.000 seconds (waited 5.000 seconds); all pooled connections were in use
We do not know exactly the cause of these errors.
However, I noticed that we have a significant amount of custom LoadBalancing code for DB in GitLab EE that might simply be not prepared to run on Puma. The only multi-threaded server that we run up to now is Sidekiq, but
Sidekiq seems to be exempt from the LoadBalancing.
This is a potential quite a big problem to be resolved before we can move on with Puma.
The related code is in: https://gitlab.com/gitlab-org/gitlab/blob/master/ee%2Flib%2Fgitlab%2Fdatabase%2Fload_balancing.rb
What we are looking for?
- How the read-only connections are handled?
- Are the read-only returned to the pool properly?
- How the read-write connections are handled?
- Are the read-write connections returned to the pool properly?
- Can already open connection used by thread A be used by thread B?
- Does it affect as well the
Puma T1orPuma T>1? - Why we didn't see this before?
- Do we know how many database connections are open from the single host to read-only replicas and read-write?