Update multi-node (example HA) Geo docs
Problem to solve
Based on this Slack conversation in the geo channel, there are a few things that could be clarified/enhanced/updated with these pages:
- https://docs.gitlab.com/14.10/ee/administration/geo/disaster_recovery/
- https://docs.gitlab.com/14.10/ee/administration/geo/disaster_recovery/planned_failover.html
- https://docs.gitlab.com/14.10/ee/administration/geo/disaster_recovery/bring_primary_back.html
The following points are important:
- What is the
gitlab-cluster.json
file and what is it's impact if removed/not created etc?- Currently this file is only referenced in the troubleshooting section (AFAIK): https://docs.gitlab.com/ee/administration/geo/replication/troubleshooting.html#recovering-from-a-partial-failover
- What happens to the file if the promoted secondary is relegated back to primary?
- Reasoning for promotion order of component nodes
- This would help more easily in architectures that do not strictly fit into this category: https://docs.gitlab.com/14.10/ee/administration/geo/disaster_recovery/#promoting-a-secondary-site-with-multiple-nodes-running-gitlab-145-and-later
Apart from that, I believe that there is room to refactor our docs to make things easier to understand. Most recent ZD ticket: https://gitlab.zendesk.com/agent/tickets/289905
Further details
Proposal
Who can address the issue
Other links/references
Edited by Alvin Gounder