Test no-downtime HA upgrades (including Geo)
We claim in https://docs.gitlab.com/ee/update/#upgrading-without-downtime that GitLab can be upgraded with no downtime, but we don't have a rigorous test to ensure this works. For example, we've seen high rate of errors when:
- Rails is upgraded
- Ruby is upgraded (e.g. https://gitlab.com/gitlab-org/gitlab-ee/issues/12458)
- Database migrations are applied but not all the old nodes upgraded (https://gitlab.com/gitlab-org/gitlab-ce/issues/64145#note_188616038)
- Cache poisoning (https://gitlab.com/gitlab-org/gitlab-ce/issues/63510)
As a first step, we may want to:
- Deploy a 2-node cluster of GitLab with the same version
- On one node, upgrade GitLab and run pre-deploy migrations.
- Run all smoke tests and look for 500 errors (these smoke tests should incude Geo smoke tests)
Edited by 🤖 GitLab Bot 🤖