Fix zero-downtime upgrade process/instructions for multi-node Geo deployments
We have observed downtime while following the zero-downtime upgrade instructions for multi-node Geo deployments. This issue covers identifying and fixing the blockers to zero-downtime, and updating zero-downtime upgrade instructions.
-
Identify at which step(s) downtime occurs during an upgrade. This might involve using HAProxy dashboards, real-time server logs and/or other means to get live feedback (end-to-end tests have a delay related to built-in waits inherent to these types of tests).
-
Fix any blockers to zero-downtime upgrades.
-
Test revised zero-downtime upgrade process on current and previous versions of GitLab (the versions with version-specific instructions available on docs.gitlab.com)
-
Revise zero-downtime instructions for current GitLab version, and update instructions for the previous GitLab versions tested in previous step (either with corrected instructions for zero-downtime upgrades or removal of instructions if zero-downtime is not possible for those versions)