Add configuration to use service discovery to identify and connect to the primary database server for schema migrations
Context
This has come up during gitlab-com/gl-infra/production#8212 (comment 1244386737) and is related to #889 (closed).
Problem
We just discover that we need to bypass PgBouncer and connect directly to the PostgreSQL primary server when applying post-deployment migrations. This is a must-have for post-deployment migrations containing statements that take more than 15s (current statement timeout on the server side) to execute in production AND a nice-to-have for the remaining.
Why? If we connect through PgBouncer we cannot guarantee that a single connection will be used to execute all DDL/DML statements in a given migration (PgBouncer is likely to cycle them). If we can't guarantee that, we cannot use e.g. session settings, such as reducing the statement timeout for long-running migrations.
On clustered database deployments, such as on GitLab.com, the primary server will change over time (due to maintenance or failover). So ideally, the registry should be able to use service discovery to determine the current primary server address and use that when applying schema migrations.
Solution
It seems like we need new configuration settings under the database
section to be able to configure service discovery. This would be similar to how it's done for Rails: https://docs.gitlab.com/ee/administration/postgresql/database_load_balancing.html#service-discovery. We'd then configure a DNS record name and server against which the registry could look up the current primary server address. This address would then be used to manage schema migrations (regular or post-deployment), instead of database.host
and database.port
(which points to PgBouncer).