WIP: Geo support for advanced PostgreSQL configurations
# Geo support for advanced PostgreSQL configurations ## Introduction Geo currently supports only [a single PostgreSQL database node](https://docs.gitlab.com/ee/administration/geo/replication/database.html) when using Omnibus. This is acceptable in simple, single node installations but is a limitation when Disaster Recovery is a consideration. For example, all [current reference architectures](https://docs.gitlab.com/ee/administration/high_availability/#reference-architecture-examples). e,ven the smallest 10,000 user configuration, use three PostgreSQL nodes. For Disaster Recovery a boring solution would be to have a largely identical secondary cluster that mirrors the primary; however, this may be challenging at this moment. [Geo does also not support repmgrd](https://gitlab.com/gitlab-org/omnibus-gitlab/issues/3361), which is currently shipped with Omnibus. Gitlab.com does not use `repmgrd` and uses `Patroni` instead, which is also not supported in Omnibus yet. [Omnibus support for Patroni](https://gitlab.com/gitlab-org/omnibus-gitlab/issues/3752) may be added in the first quarter of 2020. In order to improve Geo's Disaster Recovery, we should start supporting and/or document Geo configurations that utilise more complex PostgreSQL configurations. In a more general sense, this may open up a number of questions surrounding how we (as in GitLab) want to support complex infrastructure deployments. At this moment it appears that we are going to encourage a few reference architectures, which ideally should also allow for a simple DR setup. ## Problems to solve ### What does Geo support *mean*? When we say "Geo supports more complex PostgreSQL configurations" two things should be considered: 1. We have successfully deployed a Geo configuration using a specific technology. Support here would mainly refer to "can be configured to use and is recommended" 1. Geo needs to implement specific features in order to support a more complex PostgreSQL configuration. I think this is an item for discussion because we are currently tightly coupled to PostgreSQL streaming configuration. Is this something we want to decouple? What if we wanted to use a different replication mechanism, such as logical replication? ## Support for multiple PostgreSQL databases on the secondary node There is some customer demand for [supporting multiple databases in a single secondary](https://gitlab.com/gitlab-org/gitlab/issues/5494) We should investigate if this is desirable, what the exact use case is and why we would want to support this. ## Support for cascading replication There is [some discussion on supporting cascading replication](https://gitlab.com/gitlab-org/gitlab/issues/9723). This can be useful when you want to chain secondaries and replicate secondaries from another secondary. This can be quite useful in DR scenarios and also may help "buffer" expensive Geo queries in an HA configuration. IMHO this mainly boils down to configuration. ## Support for HA PostgreSQL via Patroni Gitlab.com uses Patroni for PostgreSQL high availability clusters in conjunction with Consul for service discovery. The current Omnibus does only include `repmgr` but [may support Patroni](https://gitlab.com/gitlab-org/omnibus-gitlab/issues/3752) in `12.7` Given that we are trying to enable Geo on .com, we may need to configure Geo in such a way that it can work with a Patroni cluster. Ultimately, I assume this is more about configuring Patroni and Consul but it would be great if we could prove that a Geo installation can also [support a Patroni standby cluster](https://patroni.readthedocs.io/en/latest/replica_bootstrap.html#standby-cluster) ## Support for symmetrical reference architecture DR We currently have [three reference architectures](https://docs.gitlab.com/ee/administration/high_availability/#reference-architecture-examples) for large (10,000, 25,000, 50,000 users) GitLab deployments. Customers considering these deployments are often also interested in Disaster Recovery and including Geo as an option would allow us to tell a simple story for what DR capabilities are included. This means we should try and test and configure Geo using these examples and evaluate what the deployment should look like. * Support Geo on PostgreSQL 11 and 12 We are going to move the minimum required PostgreSQL version to 11 with GitLab `13.0` and we need to make sure we support that migration for our customers the last migration was not without issues. ## Intended users * Systems administrators maintaining GitLab installations ## Further details TBD ## Proposal * We have several potential problems to solve here and the first step is to validate some of these problems and then deciding on what to do first * We also need to see which problems we really need to solve in Geo. ## What does success look like, and how can we measure that? * Geo documentation available to support DR in different reference architectures * Geo documentation/ features available to support Patroni Standby clusters (or another HA solution) * Geo migration from 9/10 to 11/12 tested and implemented before `13.0` ## What is the type of buyer? * Premium * Ultimate ## Links references
epic