Upgrade 11.11.4 to 12.0.2 caused noticeable downtime (despite zero downtime procedure being used)
As we have several developers depending on GitLab, downtime on that installation is bad for us.
Luckily the documentation has a section on Zero downtime updates, and until now following that procedure has not caused anyone in the company to complain to me (although I have been able to get errors likely related to the ongoing upgrade), but today it was different.
As there has been reports of upgrade problems to version 12, I've been holding back on it until today, and in line with best practices I started out by upgrading our EE installation from 11.11.3 to 11.11.4 (to be at the latest point release in the 11 series), that went smooth.
The I started upgrading to 12.0.2, and during step 2 of the procedure (/etc/gitlab/skip-auto-reconfigure
mentioned in step 1 has existed since late august, so I didn't explicitly do that touch), while apt
was saying Unpacking gitlab-ee (12.0.2-ee.0) over (11.11.4-ee.0) ...
, GitLab gave out 500 errors for so long that people started asking me if anything was wrong with GitLab. I.e. GitLab was down!
Please either modify the documentation so it doesn't say zero downtime or make sure that it actually is zero - Some users, like us, will notice even very little, and tolerate no more, so when you say zero we expect zero, I could easily have done this upgrade out of normal office hours (and that will probably be my plan for future upgrades), so it wouldn't have affected anyone, but without knowing this risk, I couldn't account for it.
Looking at /var/log/gitlab/nginx/gitlab_access.log
I see 419 requests that received a 500 response, all between 27/Jun/2019:08:27:16 +0000
and 27/Jun/2019:08:30:52 +0000
.