Create new gitaly storage shard node `file-48-stor-gprd` to replace `file-46-stor-gprd` in the configured rotation for storing new projects
C3
Production Change - Criticality 3Change Objective | Increase capacity for new project repository storage |
---|---|
Change Type | Add additional infrastructure instances |
Services Impacted | Gitaly |
Change Team Members | @nnelson |
Change Severity | ~C3 |
Buddy check or tested in staging | Username of a colleague who will review the change or evidence the change was tested on staging environment |
Schedule of the change | 2020-03-06 17:00 UTC |
Duration of the change | 1 hour |
Detailed steps for the change. Each step must include: | See below: Summary |
Production change requires commented manager approval: | /cc @dawsmith |
Meta
-
Replace all occurrences of " XX
" with the new gitaly shard node number. -
Replace all occurrences of " YY
" with the old gitaly shard node number that the new one will replace. -
Set the title of this production change issue to: Create new gitaly storage shard node file-48-stor-gprd
to replacefile-46-stor-gprd
in the configured rotation for storing new projects -
Add labels:
/label Infrastructure C3 change requires production access ServiceGitaly ServiceGCP
Summary
To support: Create new gitaly storage shard node to replace nfs-file46
- Detailed steps for the change
-
Build the new VM instance -
Ensure the creation of the storage directory -
Tell the GitLab application about the new node -
Roll out the new configurations -
Test the new node -
Enable the new node in Gitlab -
Disable the old node in Gitlab
Detailed steps for the change
The following are the detailed steps for the change.
Note: These steps do not apply to Praefect systems.
Build the new VM instance
- pre-conditions for execution of the step
-
Create a new MR which increments the "multizone-stor"
variable setting by 1 around line472
of the fileenvironments/gprd/variables.tf
. -
Link: Increment multione-store by 1 to 28 -
Have the MR reviewed by a colleague.
-
- execution commands for the step
-
Optionally, check quotas before applying the terraform changes. You can check with:
gcloud --project='gitlab-production' compute regions describe us-east1 --format=json | jq -c '.quotas[] | select(.limit > 0) | select(.usage / .limit > 0.5) | { metric, limit, usage }'
-
Merge the MR. -
Click the apply-to-prod
pipeline stageplay
button.
-
- post-execution validation for the step
-
Examine the gprd apply
pipeline stage output and confirm the absence of errors.
-
- rollback of the step
-
Revert the MR.
-
Ensure the creation of the storage directory
Once the gitaly node is created, it will take a few minutes for chef to run on the system, so it may not be immediately available.
- pre-conditions for execution of the step
-
Make sure chef-client
runs without any errors.
export node='file-48-stor-gprd.c.gitlab-production.internal' bundle exec knife ssh "fqdn:$node" "sudo grep 'Chef Client finished' /var/log/syslog | tail -n 1"
-
- execution commands for the step
-
If chef does not converge after 5 minutes or so, then invoke it manually. If chef refuses to run, then something is wrong, and this procedure should be rolled-back.
bundle exec knife ssh "fqdn:$node" "sudo chef-client"
-
Confirm storage directory /var/opt/gitlab/git-data/repositories
exists on the file system of the new node.
bundle exec knife ssh "fqdn:$node" "sudo df -hT /var/opt/gitlab/git-data/repositories && sudo ls -la /var/opt/gitlab/git-data/ && sudo ls -la /var/opt/gitlab/git-data/repositories | head"
-
- post-execution validation for the step
-
Confirm that the gitaly service is running
bundle exec knife ssh "fqdn:$node" "sudo gitlab-ctl status gitaly"
-
Confirm that there are no relevant errors in the logs.
bundle exec knife ssh "fqdn:$node" "sudo grep -i 'error' /var/log/gitlab/gitaly/current | tail"
-
- rollback of the step
Tell the GitLab application about the new node
Configure the GitLab application to include the new node. Note: Do not enable it yet.
- pre-conditions for execution of the step
-
Create a new MR in the chef-repo
project with the following changes: -
Update the default_attributes.omnibus-gitlab.gitlab_rb.git_data_dirs
list entry around line387
of the fileroles/gprd-base-stor-gitaly.json
with the new Gitaly storage storage node entry similar to:
"nfs-file48": { "path": "/var/opt/gitlab/git-data-file48", "gitaly_address": "tcp://file-48-stor-gprd.c.gitlab-production.internal:9999" },
-
Update the default_attributes.omnibus-gitlab.gitaly.storage
map entry around line280
of fileroles/gprd-base.json
, add an entry similar to:
{ "name": "nfs-file48", "path": "/var/opt/gitlab/git-data/repositories" },
-
- execution commands for the step
-
Merge the MR. -
Click the apply-to-prod
pipeline stageplay
button.
-
- post-execution validation for the step
-
Examine the pipeline stage output to verify that there were no errors. -
Optionally have chef check for the change:
$ bundle exec knife role show gprd-base-stor-gitaly | grep -A1 'nfs-file48' name: nfs-file48 path: /var/opt/gitlab/git-data/repositories
-
- rollback of the step
-
Revert the MR.
-
Roll out the new configurations
Get these configurations to the relevant systems in the fleet.
-
execution commands for the step
-
Check status of chef on production systems that had their roles edited in the chef MR
bundle exec knife ssh -C 5 "roles:gprd-base-stor-gitaly OR roles:gprd-base-fe OR roles:gprd-base-be" "sudo systemctl is-active chef-client.service"
-
Stop chef-client on those nodes:
bundle exec knife ssh -C 5 "roles:gprd-base-stor-gitaly OR roles:gprd-base-fe OR roles:gprd-base-be" "sudo systemctl stop chef-client.service"
-
Merge the MR in the chef-repo
that you prepared in step 1 -
Do a dry run on one old gitaly machine, one new gitaly machine, one web machine and confirm the changes are as desired, for example:
bundle exec knife ssh 'fqdn:file-01-stor-gprd.c.gitlab-production.internal' 'sudo chef-client --why-run' bundle exec knife ssh 'fqdn:file-48-stor-gprd.c.gitlab-production.internal' 'sudo chef-client --why-run' bundle exec knife ssh 'fqdn:web-cny-01-sv-gprd.c.gitlab-production.internal' 'sudo chef-client --why-run'
-
Force chef-client
to run on gitaly nodes (if you run chef on web/api nodes at this point they would be trying to connect to gitaly nodes before they were ready):
bundle exec knife ssh -C 2 "roles:gprd-base-stor-gitaly" "sudo chef-client"
-
-
post-execution validation for the step
-
Check the gitaly logs for the absence of any relevant errors in order to confirm the gitaly service is operating normally.
export node='file-48-stor-gprd.c.gitlab-production.internal' bundle exec knife ssh "fqdn:$node" "sudo grep -i 'error' /var/log/gitlab/gitaly/current | tail | jq"
-
Run chef-client
on remaining machines, gradually:
bundle exec knife ssh -C 2 "roles:gprd-base-stor-gitaly OR roles:gprd-base-fe OR roles:gprd-base-be" "sudo chef-client"
-
Confirm that the chef-client
service is enabled and active:
bundle exec knife ssh -C 5 "roles:gprd-base-stor-gitaly OR roles:gprd-base-fe OR roles:gprd-base-be" "sudo systemctl is-active chef-client.service"
-
-
rollback of the step
-
Revert the MR from Step: Tell the GitLab application about the new node -
Re-run the steps from Step: Roll out the new configurations
-
Test the new node
Confirm that the new storage node is operational.
- execution commands for the step
-
Create a new project, but do not push any data to it. -
Copy its Project ID. -
Use the API to move it to a new storage server before pushing any data to it. -
Now that the project is moved, push some data to it and ensure that everything works. Namely, be sure that the web interface updates with the data you have pushed.
-
Enable the new node in Gitlab
Enabling new nodes in the GitLab admin console requires using an admin account to change where new projects are stored. In Admin Area
> Settings
> Repository
> Repository storage
> Expand
, you will see a list of storage nodes. The ones that are checked are the ones that will receive new projects. For more information see gitlab docs.
- execution commands for the step
-
Open a private browser window or tab and navigate to: https://gitlab.com/admin/application_settings/repository -
Click the Expand
button next toRepository storage
. -
Click on the checkbox next to nfs-file48
and verify that it is now checked✅ .
-
Disable the old node in Gitlab
Disable the old node from the pool of nodes which are configured to store new projects.
- execution commands for the step
-
Click on the checkbox next to nfs-file46
and verify that it is now un-checked.
-