Test no-downtime HA upgrades (including Geo)

We claim in https://docs.gitlab.com/ee/update/#upgrading-without-downtime that GitLab can be upgraded with no downtime, but we don't have a rigorous test to ensure this works. For example, we've seen high rate of errors when:

  • Rails is upgraded
  • Ruby is upgraded (e.g. https://gitlab.com/gitlab-org/gitlab-ee/issues/12458)
  • Database migrations are applied but not all the old nodes upgraded (https://gitlab.com/gitlab-org/gitlab-ce/issues/64145#note_188616038)
  • Cache poisoning (https://gitlab.com/gitlab-org/gitlab-ce/issues/63510)

As a first step, we may want to:

  1. Deploy a 2-node cluster of GitLab with the same version
  2. On one node, upgrade GitLab and run pre-deploy migrations.
  3. Run all smoke tests and look for 500 errors (these smoke tests should incude Geo smoke tests)
Edited Jul 06, 2020 by 🤖 GitLab Bot 🤖
Assignee Loading
Time tracking Loading