WIP: Geo support for advanced PostgreSQL configurations
# Geo support for advanced PostgreSQL configurations
## Introduction
Geo currently supports only [a single PostgreSQL database node](https://docs.gitlab.com/ee/administration/geo/replication/database.html)
when using Omnibus. This is acceptable in simple, single node installations but
is a limitation when Disaster Recovery is a consideration. For example, all
[current reference architectures](https://docs.gitlab.com/ee/administration/high_availability/#reference-architecture-examples).
e,ven the smallest 10,000 user configuration, use three PostgreSQL nodes. For
Disaster Recovery a boring solution would be to have a largely identical secondary cluster that mirrors
the primary; however, this may be challenging at this moment. [Geo does also not
support repmgrd](https://gitlab.com/gitlab-org/omnibus-gitlab/issues/3361), which
is currently shipped with Omnibus. Gitlab.com does not use `repmgrd` and uses
`Patroni` instead, which is also not supported in Omnibus yet. [Omnibus
support for Patroni](https://gitlab.com/gitlab-org/omnibus-gitlab/issues/3752) may
be added in the first quarter of 2020. In order to improve Geo's Disaster Recovery,
we should start supporting and/or document Geo configurations that utilise
more complex PostgreSQL configurations.
In a more general sense, this may open up a number of questions surrounding how
we (as in GitLab) want to support complex infrastructure deployments. At this
moment it appears that we are going to encourage a few reference architectures,
which ideally should also allow for a simple DR setup.
## Problems to solve
### What does Geo support *mean*?
When we say "Geo supports more complex PostgreSQL configurations" two things
should be considered:
1. We have successfully deployed a Geo configuration using a specific
technology. Support here would mainly refer to
"can be configured to use and is recommended"
1. Geo needs to implement specific features in order to support a more complex
PostgreSQL configuration. I think this is an item for discussion because
we are currently tightly coupled to PostgreSQL streaming configuration. Is this
something we want to decouple? What if we wanted to use a different replication
mechanism, such as logical replication?
## Support for multiple PostgreSQL databases on the secondary node
There is some customer demand for [supporting multiple databases in a single secondary](https://gitlab.com/gitlab-org/gitlab/issues/5494)
We should investigate if this is desirable, what the exact use case is and why
we would want to support this.
## Support for cascading replication
There is [some discussion on supporting cascading replication](https://gitlab.com/gitlab-org/gitlab/issues/9723). This
can be useful when you want to chain secondaries and replicate secondaries from
another secondary. This can be quite useful in DR scenarios and also may help
"buffer" expensive Geo queries in an HA configuration. IMHO this mainly boils
down to configuration.
## Support for HA PostgreSQL via Patroni
Gitlab.com uses Patroni for PostgreSQL high availability clusters in conjunction
with Consul for service discovery. The current Omnibus does only include `repmgr`
but [may support Patroni](https://gitlab.com/gitlab-org/omnibus-gitlab/issues/3752) in `12.7`
Given that we are trying to enable Geo on .com, we may need to configure Geo in
such a way that it can work with a Patroni cluster. Ultimately, I assume this is
more about configuring Patroni and Consul but it would be great if we could prove
that a Geo installation can also [support a Patroni standby cluster](https://patroni.readthedocs.io/en/latest/replica_bootstrap.html#standby-cluster)
## Support for symmetrical reference architecture DR
We currently have [three reference architectures](https://docs.gitlab.com/ee/administration/high_availability/#reference-architecture-examples) for large (10,000, 25,000, 50,000 users) GitLab deployments. Customers
considering these deployments are often also interested in Disaster Recovery and
including Geo as an option would allow us to tell a simple story for what DR capabilities
are included. This means we should try and test and configure Geo using these examples
and evaluate what the deployment should look like.
* Support Geo on PostgreSQL 11 and 12
We are going to move the minimum required PostgreSQL version to 11 with GitLab
`13.0` and we need to make sure we support that migration for our customers
the last migration was not without issues.
## Intended users
* Systems administrators maintaining GitLab installations
## Further details
TBD
## Proposal
* We have several potential problems to solve here and the first step is to
validate some of these problems and then deciding on what to do first
* We also need to see which problems we really need to solve in Geo.
## What does success look like, and how can we measure that?
* Geo documentation available to support DR in different reference architectures
* Geo documentation/ features available to support Patroni Standby clusters (or another HA solution)
* Geo migration from 9/10 to 11/12 tested and implemented before `13.0`
## What is the type of buyer?
* Premium
* Ultimate
## Links references
epic