Explore adding Database Load Balancing to the Reference Architectures
Recently it was surfaced that alongside the current mechanism for Omnibus Postgres failover that there's an additional one in GitLab as well.
An older mechanism, this looks to have been initially added as the first iteration of handling failover with GitLab. This specifically has been superseded by the PgBouncer + Consul approach we recommend today and use in the Reference Architectures. The docs page linked above is outdated as a result as well as being annexed, leading to it becoming a sort of hidden feature.
However, after further detailed discussions, we now know that this mechanism is still in use today in a different form on .com for it's other key benefit - Distributed reads for the database against secondaries - which the current mechanism we're using in the Reference Architectures doesn't allow for. @stanhu has a detailed write up here of how this is done on .com but in a nutshell the approach is combination of both mechanisms with PgBouncer used throughout.
On reflection a similar approach to what's being done on .com has potentially significant performance benefits for the Reference Architectures. While it's more complicated, the ability to do distributed reads may reduce the impact on the Postgres primary notably - potentially leading to lower spec requirements throughout. As such then we want to explore this as a priority. The approach will be as follows:
- Continue with the current setup for Postgres primary. As @stanhu details a separated PgBouncer cluster that follows the primary exclusively is key for HA and failover, colocating will cause several issues.
- Enable a second cluster of PgBouncers, colocated on each Postgres, to be used only for distributed reads on secondaries. We've actually been considering for some time now just enabling PgBouncer on non HA environments as well for it's pooling benefits so this is timely.
- Enable GitLab Rails to poll consul for the secondaries to use for distributed reads via the
db_load_balancing
setting. In this setup it will be polling for the second database addresses but since port will be set for PgBouncer it will connect to the colocated PgBouncer instead.