Setup redundant pgbouncer instances

Currently, GitLab.com's pgbouncer instance is a single point of failure (SPOF).

This problem is compounded by the fact that pgbouncer is not exporting metrics such as number of available connections (see gitlab-org/omnibus-gitlab#2455, gitlab-org/omnibus-gitlab#2815 (closed))

Many of the outages we see for pgbouncer occur during a deploy, but if we were to deploy new GitLab versions with a rolling deploy, halting on any errors, these problems could be avoided.

Considering how critical pgbouncer is to the availability of any GitLab instance, this should be prioritised.

Related gitlab.com outage issues:

  • https://gitlab.com/gitlab-com/infrastructure/issues/4056
  • https://gitlab.com/gitlab-com/infrastructure/issues/3876
  • https://gitlab.com/gitlab-com/infrastructure/issues/2266
  • https://gitlab.com/gitlab-com/infrastructure/issues/1602
Edited Apr 19, 2018 by Andrew Newdigate
Assignee Loading
Time tracking Loading