Skip to content

Build an implementation plan for Container Registry active database load balancing

Context

This is part of &8591 (closed). Before implementing active database load balancing, we must devise a plan and validate its feasibility. That's the purpose of this issue.

Support for the registry database on self-managed installs is currently in Beta (&5521). The ideal solution would fit both self-managed and GitLab.com use cases, but we do not have any customer request and/or any kind of insight about usage/performance to decide whether or not load-balancing would be beneficial at a smaller scale. Therefore, for the first version, we want to leverage the GitLab.com architecture (namely Consul hostname resolution and pgBounder load-balancing) to achieve this with a more straightforward implementation on the application side.

Background

On GitLab.com, the registry database is hosted on a dedicated PostgreSQL cluster comprised of a primary server and multiple replica servers:

sequenceDiagram
  participant registry
  participant pgbouncer_gcp_lb
  participant pgbouncer
  participant consul
  participant patroni_registry_master
  participant patroni_registry_replica
  registry->>pgbouncer_gcp_lb: Postgres request
  pgbouncer_gcp_lb->>pgbouncer: Forward Postgres request
  pgbouncer->>consul: Query for master.patroni-registry.service.consul
  consul->>pgbouncer: Patroni master address
  pgbouncer->>patroni_registry_master: Postgres request
  patroni_registry_master->>patroni_registry_replica: Replication

Problem

  • The registry only supports connecting to a single PostgreSQL host, so we currently only use the primary server;
  • On GitLab.com, we're currently averaging 4k queries per second (source);
  • On GitLab.com, the registry API rate is currently made of ~95% reads (source). In case of a primary database server failure we're currently unable to serve most of the registry traffic. We could continue doing so if leveraging on the read-only replicas.

Therefore, we should strive to achieve active load balancing for availability and performance reasons.

Implementation Plan (WIP)

Edited by João Pereira