Make Geo HA docs the "main" reference architecture
Everyone can contribute. Help move this issue forward while earning points, leveling up and collecting rewards.
Assertion
Relatively few Geo customers use a single machine per region. And these customers are not the ones needing a lot of support.
Problem
But our biggest and most important configuration doc, Geo database replication, assumes the minimal case of a single machine in both regions.
| Primary | Secondary |
|---|---|
| 1 monolithic node | 1 monolithic node |
- It's hard to extrapolate these instructions to other use-cases
- As a result of us engineers always working on the minimal case, instructions that are irrelevant to DB configuration have leaked into Geo database replication docs.
- Our Geo High Availability doc (arguably the most important one to customers) references Geo database replication in a way that will never be adequate, since HA segregates configuration and the other doc does not.
Proposal
Make the Geo setup docs follow on https://docs.gitlab.com/ee/administration/high_availability/README.html#reference-architecture. The primary and secondary clusters will each run:
- 3 PostgreSQL - 4 CPU, 16GiB memory per node
- 1 PgBouncer - 2 CPU, 4GiB memory
- 2 Redis - 2 CPU, 8GiB memory per node
- 3 Consul/Sentinel - 2 CPU, 2GiB memory per node
- 4 Sidekiq - 4 CPU, 16GiB memory per node
- 5 GitLab application nodes - 16 CPU, 64GiB memory per node
- 1 Gitaly - 16 CPU, 64GiB memory
- 1 Monitoring node - 2 CPU, 8GiB memory, 100GiB local storage
And relegate the 2-machine case to a lesser doc referencing the HA docs as needed. (Edit: A quick start guide for setting up a 2-machine env has been added.) The HA docs, by nature, should make it clear that you can combine or segregate roles as you prefer.