Geo: Some questions on the order of running commands in zero-downtime upgrades
@fzimmer The plot, it finally thickens...
In https://gitlab.com/gitlab-org/gitlab-ee/issues/12625 we didn't fully understand why we needed to run gitlab-rake db:migrate on the primary. But after reading the multi-node / HA deployment instructions again I understand why:
Because the user was asked to add gitlab_rails['auto_migrate'] = false to /etc/gitlab/gitlab.rb.
But I do not understand why this is needed in the first place?
To understand this, let's do analysis by looking what happens with HA setups
In multi-node / HA deployment
You pick one deploy node. This node is used to decide when migrations run.
On that deploy node, you have to create /etc/gitlab/skip-auto-reconfigure to make sure during apt upgrade gitlab the migrations are not run (indirectly through gitlab-ctl reconfigure).
On the other nodes, you need to set gitlab_rails['auto_migrate'] = false in /etc/gitlab/gitlab.rb. From migrations point-of-view this has the same effect, migrations do not run automatically on apt upgrade gitlab.
Next you run apt upgrade gitlab and SKIP_POST_DEPLOYMENT_MIGRATIONS=true sudo gitlab-ctl reconfigure on the deploy node, this triggers only the pre-deployment migrations.
Then upgrade all the other nodes with apt upgrade gitlab. These nodes run gitlab-ctl reconfigure (not sure why the command needs to be triggered manually), but do not perform any migrations.
The last step is doing post-deployment migrations on the deploy node.
In Geo (single-machine) deployment
One can think of Geo as a sort of HA setup. So you need a deploy node: the Geo Primary node.
I understand you do not want the migrations to run automatically on apt upgrade gitlab, but IMHO only doing one of both is needed:
-
Create /etc/gitlab/skip-auto-reconfigure -
Set gitlab_rails['auto_migrate'] = falsein/etc/gitlab/gitlab.rb
Since the primary node is considered the equivalent as a deploy node, I'd say only the former.
On the secondary nodes, you could think they need to add gitlab_rails['auto_migrate'] = false in /etc/gitlab/gitlab.rb, but no, Omnibus already knows it's a Geo secondary, so that setting is implied.
Questions
Why does /etc/gitlab/skip-auto-reconfigure exist anyway?
On the deploy node we could also:
- Set
gitlab_rails['auto_migrate'] = falsein/etc/gitlab/gitlab.rb - Run
apt upgrade gitlab - Run
SKIP_POST_DEPLOYMENT_MIGRATIONS=true gitlab-rake db:migrate(instead ofSKIP_POST_DEPLOYMENT_MIGRATIONS=true sudo gitlab-ctl reconfigure)
Seems an equal amount of steps? It might be related to when gitlab-ctl reconfigure stops the unicorn and sidekiq processes, but I'm not sure.
If /etc/gitlab/skip-auto-reconfigure is the way to go for HA, Geo should do the same.
Why hot reload unicorn and sidekiq after post-deployment migrations and not before?
In my understanding, pre-deployment migrations can be done while the old code is still running. And post-deployment migrations are done while the new code is running.
For example, looking at the documentation on dropping columns. In step one we add ignore_column column to make sure the code will not look at the column, and at the same time a post-deployment migration is added to actually DROP the column. But in my understanding that means the code with the ignore_column statement needs to be running while the post-deploy migration is dropping the column. And by doing sudo gitlab-ctl hup unicorn after running the post-deployment migrations, that is not the case.
Proposed actions
-
Remove gitlab_rails['auto_migrate'] = falsefrom the Geo instructions -
Only ask the user to run SKIP_POST_DEPLOYMENT_MIGRATIONS=true sudo gitlab-ctl reconfigureon the primary,gitlab-rake db:migrateis not needed
