Create new gitaly storage shard node `file-51-stor-gprd` to replace `file-41-stor-gprd` in the configured rotation for storing new projects
C3
Production Change - Criticality 3Change Objective | Increase capacity for new project repository storage |
---|---|
Change Type | Add additional infrastructure instances |
Services Impacted | Gitaly |
Change Team Members | Username of the engineers involved in the change |
Change Severity | ~C3 |
Buddy check or tested in staging | @cindy |
Schedule of the change | 2020-05-06 17:00 UTC |
Duration of the change | 1 hour |
Detailed steps for the change. Each step must include: | See below: Summary |
Production change requires commented manager approval: | /cc @albertoramos |
Meta
-
Replace all occurrences of " XX
" with the new gitaly shard node number. -
Replace all occurrences of " YY
" with the old gitaly shard node number that the new one will replace. -
Set the title of this production change issue to: Create new gitaly storage shard node file-51-stor-gprd
to replacefile-41-stor-gprd
in the configured rotation for storing new projects -
Add labels by adding a comment with the following command: /label ~Infrastructure ~C3 ~change ~"requires production access" ~"Service::Gitaly"
-
Replace the first line of the Summary section below as directed. -
Acquire commented manager approval.
Summary
- Detailed steps for the change
-
Build the new VM instance -
Ensure the creation of the storage directory -
Tell the GitLab application about the new node -
Roll out the new configurations -
Test the new node -
Enable the new node in Gitlab -
Disable the old node in Gitlab
Detailed steps for the change
The following are the detailed steps for the change.
Note: These steps do not apply to Praefect systems.
Build the new VM instance
-
pre-conditions for execution of the step
-
Create a new MR. - The commit should increment the
"multizone-stor"
variable setting by 1 around line472
of the fileenvironments/gprd/variables.tf
- Here is an example title and description to use for this MR.
- The commit should increment the
-
Using the new value of the multizone-stor
field, change the MR title to: Increment multi-zone storage nodes by 1 to [the new total] -
Link: [Add a link to the MR here] -
Have the MR reviewed by a colleague.
-
-
execution commands for the step
-
Optionally, check quotas before applying the terraform changes. You can check with:
gcloud --project='gitlab-production' compute regions describe us-east1 --format=json | jq -c '.quotas[] | select(.limit > 0) | select(.usage / .limit > 0.5) | { metric, limit, usage }'
-
Merge the MR. -
Notify the Engineer On-call about the planned change. -
Click the apply-to-prod
pipeline stageplay
button.
-
-
post-execution validation for the step
-
Examine the gprd apply
pipeline stage output and confirm the absence of relevant errors.
-
-
rollback of the step
-
Revert the MR.
-
Ensure the creation of the storage directory
Once the gitaly node is created, it will take a few minutes for chef to run on the system, so it may not be immediately available.
-
pre-conditions for execution of the step
-
Make sure chef-client
runs without any errors.
export node='file-51-stor-gprd.c.gitlab-production.internal' bundle exec knife ssh "fqdn:$node" "sudo grep 'Chef Client finished' /var/log/syslog | tail -n 1"
-
-
execution commands for the step
-
If chef does not converge after 5 minutes or so, then invoke it manually. If chef refuses to run, then something is wrong, and this procedure should be rolled-back.
bundle exec knife ssh "fqdn:$node" "sudo chef-client"
-
Confirm storage directory /var/opt/gitlab/git-data/repositories
exists on the file system of the new node.
bundle exec knife ssh "fqdn:$node" "sudo df -hT /var/opt/gitlab/git-data/repositories && sudo ls -la /var/opt/gitlab/git-data/ && sudo ls -la /var/opt/gitlab/git-data/repositories | head"
-
-
post-execution validation for the step
-
Confirm that the gitaly service is running
bundle exec knife ssh "fqdn:$node" "sudo gitlab-ctl status gitaly"
-
Confirm that there are no relevant errors in the logs.
bundle exec knife ssh "fqdn:$node" "sudo grep -i 'error' /var/log/gitlab/gitaly/current | tail"
-
-
rollback of the step
- No rollback procedure for this step is necessary.
- This step only confirms and verifies steps taken so far.
Tell the GitLab application about the new node
Configure the GitLab application to include the new node. Note: The GitLab application will consider the new node to be disabled by default.
-
pre-conditions for execution of the step
-
Create a new MR in the chef-repo
project.- Here is an example title and description to use for this MR.
- The commit should consist of the following changes:
-
Update the default_attributes.omnibus-gitlab.gitlab_rb.git_data_dirs
map entry around line387
of fileroles/gprd-base-stor-gitaly.json
, add an entry similar to:
{ "name": "nfs-file51", "path": "/var/opt/gitlab/git-data/repositories" },
-
Update the default_attributes.omnibus-gitlab.gitaly.storage
list entry around line280
of the fileroles/gprd-base.json
with the new Gitaly storage storage node entry similar to:
"nfs-file51": { "path": "/var/opt/gitlab/git-data-file51", "gitaly_address": "tcp://file-51-stor-gprd.c.gitlab-production.internal:9999" },
-
Have the MR reviewed by a colleague.
-
-
execution commands for the step
-
Merge the MR. -
Notify the Engineer On-call about the planned change. -
Click the Apply_to_prod
pipeline stageplay
button. -
Examine the pipeline stage output to verify that there were no errors.
-
-
post-execution validation for the step
-
Force chef-client
to run on the relevant nodes:
bundle exec knife ssh -C 3 "roles:gprd-base-stor-gitaly" "sudo chef-client"
-
Optionally, in another shell session, also force chef-client
to run on the relevant nodes. Or else just wait for the nodes to converge naturally.
bundle exec knife ssh -C 3 "roles:gprd-base-fe OR roles:gprd-base-be" "sudo chef-client"
-
Optionally have chef check for the change:
$ bundle exec knife role show gprd-base-stor-gitaly | grep -A1 'nfs-file51' name: nfs-file51 path: /var/opt/gitlab/git-data/repositories
-
-
rollback of the step
-
Revert the MR. -
Click the Apply_to_prod
pipeline stageplay
button. -
Re-run the commands in the post-execution validation for the step
-
Roll out the new configurations
Get these configurations to the front-end and back-end systems in the fleet.
-
pre-conditions for execution of the step
-
Check the gitaly logs for the absence of any relevant errors in order to confirm the gitaly service is operating normally.
export node='file-51-stor-gprd.c.gitlab-production.internal' bundle exec knife ssh "fqdn:$node" "sudo grep -i 'error' /var/log/gitlab/gitaly/current | tail | jq"
-
-
execution commands for the step
-
In a new shell session, run chef-client
on front-end systems, gradually:
bundle exec knife ssh -C 3 "roles:gprd-base-fe" "sudo chef-client"
-
In a new shell session, run chef-client
on back-end systems, gradually:
bundle exec knife ssh -C 3 "roles:gprd-base-be" "sudo chef-client"
-
-
post-execution validation for the step
-
Optionally, in yet another separate shell session, run these chef commands to list every system that still remains without the expected configuration entry:
export command="sudo grep --files-without-match 'nfs-file51' /etc/gitlab/gitlab.rb" bundle exec knife ssh -C 3 "roles:gprd-base-fe" "$command" bundle exec knife ssh -C 3 "roles:gprd-base-be" "$command"
-
-
rollback of the step
-
Execute the rollback commands for Step: Tell the GitLab application about the new node -
Re-run the steps from Step: Roll out the new configurations
-
Test the new node
Confirm that the new storage node is operational.
-
execution commands for the step
-
Create a new project, but do not push any data to it. -
Copy its Project ID. -
Use the API to move it to a new storage server before pushing any data to it. -
Now that the project is moved, push some data to it and ensure that everything works. Namely, be sure that the web interface updates with the data you have pushed.
-
Enable the new node in Gitlab
Enabling new nodes in the GitLab admin console requires using an admin account to change where new projects are stored. In Admin Area
> Settings
> Repository
> Repository storage
> Expand
, you will see a list of storage nodes. The ones that are checked are the ones that will receive new projects. For more information see gitlab docs.
-
execution commands for the step
-
Open a private browser window or tab and navigate to: https://gitlab.com/admin/application_settings/repository -
Click the Expand
button next toRepository storage
. -
Click on the checkbox next to nfs-file51
and verify that it is now checked✅ . -
Don't forget to click the Save changes
button.
-
-
post-execution validation for the step
-
Take a count of how many projects are being created on the new shard:
export node='file-51-stor-gprd.c.gitlab-production.internal' bundle exec knife ssh "fqdn:$node" "sudo find /var/opt/gitlab/git-data/repositories/@hashed -mindepth 2 -maxdepth 3 -name *.git | wc -l"
-
Observe that this number goes up over time.
-
Disable the old node in Gitlab
Disable the old node from the pool of nodes which are configured to store new projects.
-
execution commands for the step
-
Click on the checkbox next to nfs-file41
and verify that it is now un-checked. -
Don't forget to click the Save changes
button.
-
-
post-execution validation for the step
-
Take a count of how many projects are being created on the old shard:
export node='file-41-stor-gprd.c.gitlab-production.internal' bundle exec knife ssh "fqdn:$node" "sudo find /var/opt/gitlab/git-data/repositories/@hashed -mindepth 2 -maxdepth 3 -name *.git | wc -l"
-
Observe that this number never goes up over time. (Either goes down or does not change.)
-