Geo: Clarify instructions for promoting a Geo secondary using Helm Charts
Problem to solve
A customer tried following https://docs.gitlab.com/14.5/ee/administration/geo/disaster_recovery/index.html#promoting-a-secondary-geo-cluster-in-gitlab-cloud-native-helm-charts to promote a secondary using our Helm charts and ran into some issues:
The gitlab-ctl command was on none of the pods so couldn’t promote sidekiq or gitaly. Also, I didn’t find a rails nor task-runner deployment in our setup so couldn’t promote that as well. After all that, I went ahead and promoted the secondary to primary. While it worked, it thought it was still the secondary by redirecting me to this login endpoint /users/auth/geo/sign_in
From conversation with @dbalexandre and @cat in Slack:
Catalin Irimie 1 hour ago I didn’t find a rails nor task-runner deployment in our setup so couldn’t promote that as well P.S: I think we renamed the task-runner pod to toolbox to make it less confusing about what it's used for, that may be the case why (and I think we may need to update some parts of the docs, didn't realize the rename happened alreadyDouglas Alexandre 1 hour ago Good catch!
Nick Nguyen 1 hour ago Ah, I see we specify outside of the cluster for running the single command. But we do say for each sidekiq, PG, Gitaly node so perhaps that’s part of the confusion
Nick Nguyen 1 hour ago He does say at the end of that message that he was eventually able to promote the site but it still seems like it thinks it’s a secondary. Any thoughts on why that might be happening?
Catalin Irimie 1 hour ago
because he didn't find the task-runner (now called toolbox) pod, he didn't run the gitlab-rake geo:set_secondary_as_primary command, so the DB record still thinks it's a secondary and tries to authenticate with the primary I believe
Nick Nguyen 1 hour ago Ahh, thanks! I’ll ask what they’re using for PG
Douglas Alexandre 1 hour ago He don't need to run gitlab-rake geo:set_secondary_as_primary manually, the promote command already set the secondary as a primary.
Nick Nguyen 1 hour ago @catalin Do you know what version we renamed task-runner to toolbox?
Douglas Alexandre 1 hour ago If he was not able to run the sudo gitlab-ctl geo promote on a Rails pod in his secondary site because he didn't find the task-runner (now called toolbox) pod the secondary site is not fully promoted. (edited)
Nick Nguyen 1 hour ago @douglas I see we include the task-runner in the section for GitLab 14.4 and earlier, but not sure we mention that along with the single command.
Douglas Alexandre 44 minutes ago Good find! We need to split the section with PG node instructions the Sidekiq and Gitaly pods
Douglas Alexandre 44 minutes ago kubectl --namespace gitlab exec -ti gitlab-geo-task-runner-XXX -- gitlab-rake geo promote
Nick Nguyen 41 minutes ago Cool, I’ll pass that along. He says they are using RDS for PG
Nick Nguyen 40 minutes ago @douglas is it gitlab-geo-task-runner-XXX or gitlab-geo-toolbox-XXX?
Douglas Alexandre 40 minutes ago @catalin Could you confirm please?
Catalin Irimie 37 minutes ago looking through the changes, not sure when the rename happened yet
Catalin Irimie 36 minutes ago
looks like exactly 14.5
Catalin Irimie 35 minutes ago
so yeah, gitlab-geo-toolbox-XYZ
Douglas Alexandre 31 minutes ago kubectl --namespace gitlab exec -ti gitlab-geo-toolbox-XXX -- gitlab-rake geo promote
Further details
Proposal
- In the GitLab 14.5 and later section for Step 2. Promote all secondary sites external to the cluster, differentiate between commands run for Omnibus nodes vs charts.
- Update
task-runner
totoolbox
in Gitlab 14.5+ instructions