Geo: Some questions on the order of running commands in zero-downtime upgrades
@fzimmer The plot, it finally thickens...
In https://gitlab.com/gitlab-org/gitlab-ee/issues/12625 we didn't fully understand why we needed to run gitlab-rake db:migrate
on the primary. But after reading the multi-node / HA deployment instructions again I understand why:
Because the user was asked to add gitlab_rails['auto_migrate'] = false
to /etc/gitlab/gitlab.rb
.
But I do not understand why this is needed in the first place?
To understand this, let's do analysis by looking what happens with HA setups
In multi-node / HA deployment
You pick one deploy node. This node is used to decide when migrations run.
On that deploy node, you have to create /etc/gitlab/skip-auto-reconfigure
to make sure during apt upgrade gitlab
the migrations are not run (indirectly through gitlab-ctl reconfigure
).
On the other nodes, you need to set gitlab_rails['auto_migrate'] = false
in /etc/gitlab/gitlab.rb
. From migrations point-of-view this has the same effect, migrations do not run automatically on apt upgrade gitlab
.
Next you run apt upgrade gitlab
and SKIP_POST_DEPLOYMENT_MIGRATIONS=true sudo gitlab-ctl reconfigure
on the deploy node, this triggers only the pre-deployment migrations.
Then upgrade all the other nodes with apt upgrade gitlab
. These nodes run gitlab-ctl reconfigure
(not sure why the command needs to be triggered manually), but do not perform any migrations.
The last step is doing post-deployment migrations on the deploy node.
In Geo (single-machine) deployment
One can think of Geo as a sort of HA setup. So you need a deploy node: the Geo Primary node.
I understand you do not want the migrations to run automatically on apt upgrade gitlab
, but IMHO only doing one of both is needed:
-
Create /etc/gitlab/skip-auto-reconfigure
-
Set gitlab_rails['auto_migrate'] = false
in/etc/gitlab/gitlab.rb
Since the primary node is considered the equivalent as a deploy node, I'd say only the former.
On the secondary nodes, you could think they need to add gitlab_rails['auto_migrate'] = false
in /etc/gitlab/gitlab.rb
, but no, Omnibus already knows it's a Geo secondary, so that setting is implied.
Questions
/etc/gitlab/skip-auto-reconfigure
exist anyway?
Why does On the deploy node we could also:
- Set
gitlab_rails['auto_migrate'] = false
in/etc/gitlab/gitlab.rb
- Run
apt upgrade gitlab
- Run
SKIP_POST_DEPLOYMENT_MIGRATIONS=true gitlab-rake db:migrate
(instead ofSKIP_POST_DEPLOYMENT_MIGRATIONS=true sudo gitlab-ctl reconfigure
)
Seems an equal amount of steps? It might be related to when gitlab-ctl reconfigure
stops the unicorn
and sidekiq
processes, but I'm not sure.
If /etc/gitlab/skip-auto-reconfigure
is the way to go for HA, Geo should do the same.
unicorn
and sidekiq
after post-deployment migrations and not before?
Why hot reload In my understanding, pre-deployment migrations can be done while the old code is still running. And post-deployment migrations are done while the new code is running.
For example, looking at the documentation on dropping columns. In step one we add ignore_column
column to make sure the code will not look at the column, and at the same time a post-deployment migration is added to actually DROP
the column. But in my understanding that means the code with the ignore_column
statement needs to be running while the post-deploy migration is dropping the column. And by doing sudo gitlab-ctl hup unicorn
after running the post-deployment migrations, that is not the case.
Proposed actions
-
Remove gitlab_rails['auto_migrate'] = false
from the Geo instructions -
Only ask the user to run SKIP_POST_DEPLOYMENT_MIGRATIONS=true sudo gitlab-ctl reconfigure
on the primary,gitlab-rake db:migrate
is not needed