Build an implementation plan for Container Registry active database load balancing
Context
This is part of &8591. Before implementing active database load balancing, we must devise a plan and validate its feasibility. That's the purpose of this issue.
Support for the registry database on self-managed installs is currently in Beta (&5521). The ideal solution would fit both self-managed and GitLab.com use cases, but we do not have any customer request and/or any kind of insight about usage/performance to decide whether or not load-balancing would be beneficial at a smaller scale. Therefore, for the first version, we want to leverage the GitLab.com architecture (namely Consul hostname resolution and pgBounder load-balancing) to achieve this with a more straightforward implementation on the application side.
Background
On GitLab.com, the registry database is hosted on a dedicated PostgreSQL cluster comprised of a primary server and multiple replica servers:
sequenceDiagram
participant registry
participant pgbouncer_gcp_lb
participant pgbouncer
participant consul
participant patroni_registry_master
participant patroni_registry_replica
registry->>pgbouncer_gcp_lb: Postgres request
pgbouncer_gcp_lb->>pgbouncer: Forward Postgres request
pgbouncer->>consul: Query for master.patroni-registry.service.consul
consul->>pgbouncer: Patroni master address
pgbouncer->>patroni_registry_master: Postgres request
patroni_registry_master->>patroni_registry_replica: Replication
Problem
- The registry only supports connecting to a single PostgreSQL host, so we currently only use the primary server;
- On GitLab.com, we're currently averaging 4k queries per second (source);
- On GitLab.com, the registry API rate is currently made of ~95% reads (source). In case of a primary database server failure we're currently unable to serve most of the registry traffic. We could continue doing so if leveraging on the read-only replicas.
Therefore, we should strive to achieve active load balancing for availability and performance reasons.