Create new gitaly storage shard nodes `file-69-stor-gprd` and `file-70-stor-gprd`
Production Change - Criticality 3 C3
| Change Objective | Increase capacity for new project repository storage |
|---|---|
| Change Type | Add additional infrastructure instances |
| Services Impacted | ServiceGitaly |
| Change Team Members | @craig |
| Change Severity | ~C3 |
| Buddy check or tested in staging | @nnelson @devin |
| Schedule of the change | 2022-03-24 2100UTC/1400PDT |
| Duration of the change | 2 hours |
| Detailed steps for the change. Each step must include: | See below: Summary |
| Production change requires commented manager approval: | N/A (C3 change) |
Meta
-
Replace all occurrences of " XX" with the new gitaly shard node number (69). -
Set the title of this production change issue to: Create new gitaly storage shard nodes file-69-stor-gprdandfile-70-stor-gprd -
Add labels by adding a comment with the following command: /label ~Infrastructure ~C3 ~change ~"requires production access" ~"Service::Gitaly" -
Replace the first line of the Summary section below as directed. -
Acquire commented manager approval.
Summary
Originating incident: 2022-03-18: Number of Gitaly shards (for new re... (#6649 - closed)
- Detailed steps for the change
-
Build the new VM instance -
Ensure the creation of the storage directory -
Tell the GitLab application about the new nodes -
Roll out the new configurations -
Test the new nodes -
Enable the new nodes in Gitlab
Detailed steps for the change
The following are the detailed steps for the change.
Note: These steps do not apply to Praefect systems.
Build the new VM instance
-
pre-conditions for execution of the step
-
Create a new MR. - The commit should increment the
"multizone-stor"variable setting by 2 around line472of the fileenvironments/gprd/variables.tf - Here is an example title and description to use for this MR.
- The commit should increment the
-
Using the new value of the multizone-storfield, change the MR title to: Increment multi-zone storage nodes by 2 to [the new total] -
Link: https://ops.gitlab.net/gitlab-com/gl-infra/config-mgmt/-/merge_requests/3594 -
Have the MR reviewed by a colleague.
-
-
execution commands for the step
-
Optionally, check quotas before applying the terraform changes.
% gcloud --project='gitlab-production' compute regions describe us-east1 --format=json | jq -c '.quotas[] | select(.limit > 0) | select(.usage / .limit > 0.5) | { metric, limit, usage }' {"metric":"CPUS","limit":15000,"usage":9034} {"metric":"DISKS_TOTAL_GB","limit":350000,"usage":253190} {"metric":"STATIC_ADDRESSES","limit":400,"usage":282} {"metric":"SSD_TOTAL_GB","limit":2500000,"usage":1922534} {"metric":"INTERNAL_ADDRESSES","limit":250,"usage":186}-
Merge the MR. -
Notify the Engineer On-call about the planned change. -
Click the apply-to-prodpipeline stageplaybutton.
-
-
post-execution validation for the step
-
Examine the gprd applypipeline stage output and confirm the absence of relevant errors.
-
-
rollback of the step
-
Revert the MR.
-
Ensure the creation of the storage directory
Once the gitaly nodes are created, it will take a few minutes for chef to run on the system, so they may not be immediately available.
-
pre-conditions for execution of the step
-
Temporarily override the node attribute omnibus-gitlab.package.enableon each of the new nodes so that thegitlab-eepackage can be installed by Chef
# Set `omnibus-gitlab.package.enable` to be `true` for the new nodes for i in 69 70; do bundle exec knife node edit file-${i}-stor-gprd.c.gitlab-production.internal done-
Make sure chef-clientruns without any errors.
# file-69 export node='file-69-stor-gprd.c.gitlab-production.internal' bundle exec knife ssh "fqdn:$node" "sudo grep 'Chef Client finished' /var/log/syslog | tail -n 1" # file-70 export node='file-70-stor-gprd.c.gitlab-production.internal' bundle exec knife ssh "fqdn:$node" "sudo grep 'Chef Client finished' /var/log/syslog | tail -n 1" -
-
execution commands for the step
-
If chef does not converge after 5 minutes or so, then invoke it manually. If chef refuses to run, then something is wrong, and this procedure should be rolled-back.
bundle exec knife ssh "fqdn:$node" "sudo chef-client"-
Confirm storage directory /var/opt/gitlab/git-data/repositoriesexists on the file system of the new node.
bundle exec knife ssh "fqdn:$node" "sudo df -hT /var/opt/gitlab/git-data/repositories && sudo ls -la /var/opt/gitlab/git-data/ && sudo ls -la /var/opt/gitlab/git-data/repositories | head"-
Remove node attribute overrides once gitlab-eepackage has been installed
# Remove `omnibus-gitlab.package.enable` override on the new nodes for i in 69 70; do bundle exec knife node edit file-${i}-stor-gprd.c.gitlab-production.internal done -
-
post-execution validation for the step
-
Confirm that the gitaly service is running
bundle exec knife ssh "fqdn:$node" "sudo gitlab-ctl status gitaly"-
Confirm that there are no relevant errors in the logs.
bundle exec knife ssh "fqdn:$node" "sudo grep -i 'error' /var/log/gitlab/gitaly/current | tail" -
-
rollback of the step
- No rollback procedure for this step is necessary.
- This step only confirms and verifies steps taken so far.
Configure the GitLab application so that it is aware of the new nodes
Configure the GitLab application to include the new nodes. Note: The GitLab application will consider the new nodes to be disabled by default.
-
pre-conditions for execution of the step
-
Create a new MR in the chef-repoproject.- https://gitlab.com/gitlab-com/gl-infra/chef-repo/-/merge_requests/1573
- The commit should consist of the following change:
-
Update the override_attributes.omnibus-gitlab.gitaly.storagelist entry of the fileroles/gprd-base-stor-gitaly-common.jsonwith the new Gitaly storage storage node entries similar to:
"nfs-file69": { "path": "/var/opt/gitlab/git-data-file69", "gitaly_address": "tcp://file-69-stor-gprd.c.gitlab-production.internal:9999" }, "nfs-file70": { "path": "/var/opt/gitlab/git-data-file70", "gitaly_address": "tcp://file-70-stor-gprd.c.gitlab-production.internal:9999" },-
Have the MR reviewed by a colleague.
-
-
execution commands for the step
-
Merge the MR. -
Notify the Engineer On-call about the planned change. -
Check the Apply_to_prodops.gitlab.net pipeline to see if the change successfully applied. -
Examine the pipeline stage output to verify that there were no errors.
-
-
post-execution validation for the step
-
Force chef-clientto run on the relevant nodes:
bundle exec knife ssh -C 3 "roles:gprd-base-stor-gitaly-common" "sudo chef-client"-
Optionally, in another shell session, also force chef-clientto run on the relevant nodes. Or else just wait for the nodes to converge naturally.
bundle exec knife ssh -C 3 "roles:gprd-base-fe OR roles:gprd-base-be" "sudo chef-client"-
Optionally have chef check for the change:
$ bundle exec knife role show gprd-base-stor-gitaly-common | egrep -A1 'nfs-file(69|70)' name: nfs-file69 path: /var/opt/gitlab/git-data/repositories --- name: nfs-file70 path: /var/opt/gitlab/git-data/repositories -
-
rollback of the step
-
Revert the MR. -
Check the Apply_to_prodops.gitlab.net pipeline to see if the change successfully applied. -
Re-run the commands in the post-execution validation for the step
-
Add the new Gitaly nodes to all our Kubernetes container configuration
-
pre-conditions for execution of the step
-
Create [a new MR in the gl-infra/k8s-workloads/gitlab-comproject]https://gitlab.com/gitlab-com/gl-infra/k8s-workloads/gitlab-com). -
In the MR you want to update the file releases/gitlab/values/${environment}.yaml.gotmpland add the new nodes to theglobal.gitaly.externalyaml list Typically the data looks like
- hostname: gitaly-01-sv-pre.c.gitlab-pre.internal name: default port: "9999" tlsEnabled: false-
Have the MR reviewed by a colleague in Delivery
-
-
execution commands for the step
-
Merge the MR. -
Examine the pipeline stage output to verify that there were no errors.
-
-
rollback of the step
- Completing the execution tasks for this step will suffice as a roll-back.
Test the new nodes
Confirm that the new storage nodes are operational.
-
pre-conditions for execution of the step
-
Export your gitlab.comuser auth token as an environment variable in your shell session.
export GITLAB_COM_API_PRIVATE_TOKEN='CHANGEME'-
Also export your gitlab.comadmin user auth token as an environment variable in your shell session.
export GITLAB_GPRD_ADMIN_API_PRIVATE_TOKEN='CHANGEME' -
file-60-stor-gprd
-
execution commands for the step
-
Setup environment
export project_name='nfs-file69-test' export destination_storage_name='nfs-file69'-
Create a new project:
rm -f "/tmp/project-${project_name}.json" curl --silent --show-error --request POST "https://gitlab.com/api/v4/projects?name=${project_name}&default_branch=main" --header "Private-Token: ${GITLAB_COM_API_PRIVATE_TOKEN}" > "/tmp/project-${project_name}.json" export project_id=$(cat "/tmp/project-${project_name}.json" | jq -r '.id') export ssh_url_to_repo=$(cat "/tmp/project-${project_name}.json" | jq -r '.ssh_url_to_repo')-
Clone the project.
git clone "${ssh_url_to_repo}" "/tmp/${project_name}"-
Add, commit, and push a READMEfile to the project repository.
echo "# ${project_name}" > "/tmp/${project_name}/README.md" pushd "/tmp/${project_name}" && git add "/tmp/${project_name}/README.md" && git commit -am "Add README" && git push origin main && popd-
Use the API to move it to a new storage server:
export move_id=$(curl --silent --show-error --request POST "https://gitlab.com/api/v4/projects/${project_id}/repository_storage_moves" --data "{\"destination_storage_name\": \"${destination_storage_name}\"}" --header "Private-Token: ${GITLAB_GPRD_ADMIN_API_PRIVATE_TOKEN}" --header 'Content-Type: application/json' | jq -r '.id')-
Optionally poll the api to monitor the state of the move:
curl --silent --show-error "https://gitlab.com/api/v4/projects/${project_id}/repository_storage_moves/${move_id}" --header "Private-Token: ${GITLAB_GPRD_ADMIN_API_PRIVATE_TOKEN}" | jq -r '.state'-
Optionally confirm the new location:
curl --silent --show-error "https://gitlab.com/api/v4/projects/${project_id}" --header "Private-Token: ${GITLAB_GPRD_ADMIN_API_PRIVATE_TOKEN}" | jq -r '.repository_storage'-
Once the project has finished being moved to the new shard, proceed to add, commit, and push an update to the README:
echo -e "\n\ntest" >> "/tmp/${project_name}/README.md" pushd "/tmp/${project_name}" && git add "/tmp/${project_name}/README.md" && git commit -am "Update README to test ${destination_storage_name}" && git push origin main && popd-
Verify that the changes were persisted as expected:
rm -rf "/tmp/${project_name}" git clone "${ssh_url_to_repo}" "/tmp/${project_name}" grep 'test' "/tmp/${project_name}/README.md"-
Once all tests have been completed, delete the test project
curl --silent --show-error --request DELETE "https://gitlab.com/api/v4/projects/${project_id}" --header "Private-Token: ${GITLAB_GPRD_ADMIN_API_PRIVATE_TOKEN}" -
file-70-stor-gprd
-
execution commands for the step
-
Setup environment
export project_name='nfs-file70-test' export destination_storage_name='nfs-file70'-
Create a new project:
rm -f "/tmp/project-${project_name}.json" curl --silent --show-error --request POST "https://gitlab.com/api/v4/projects?name=${project_name}&default_branch=main" --header "Private-Token: ${GITLAB_COM_API_PRIVATE_TOKEN}" > "/tmp/project-${project_name}.json" export project_id=$(cat "/tmp/project-${project_name}.json" | jq -r '.id') export ssh_url_to_repo=$(cat "/tmp/project-${project_name}.json" | jq -r '.ssh_url_to_repo')-
Clone the project.
git clone "${ssh_url_to_repo}" "/tmp/${project_name}"-
Add, commit, and push a READMEfile to the project repository.
echo "# ${project_name}" > "/tmp/${project_name}/README.md" pushd "/tmp/${project_name}" && git add "/tmp/${project_name}/README.md" && git commit -am "Add README" && git push origin main && popd-
Use the API to move it to a new storage server:
export move_id=$(curl --silent --show-error --request POST "https://gitlab.com/api/v4/projects/${project_id}/repository_storage_moves" --data "{\"destination_storage_name\": \"${destination_storage_name}\"}" --header "Private-Token: ${GITLAB_GPRD_ADMIN_API_PRIVATE_TOKEN}" --header 'Content-Type: application/json' | jq -r '.id')-
Optionally poll the api to monitor the state of the move:
curl --silent --show-error "https://gitlab.com/api/v4/projects/${project_id}/repository_storage_moves/${move_id}" --header "Private-Token: ${GITLAB_GPRD_ADMIN_API_PRIVATE_TOKEN}" | jq -r '.state'-
Optionally confirm the new location:
curl --silent --show-error "https://gitlab.com/api/v4/projects/${project_id}" --header "Private-Token: ${GITLAB_GPRD_ADMIN_API_PRIVATE_TOKEN}" | jq -r '.repository_storage'-
Once the project has finished being moved to the new shard, proceed to add, commit, and push an update to the README:
echo -e "\n\ntest" >> "/tmp/${project_name}/README.md" pushd "/tmp/${project_name}" && git add "/tmp/${project_name}/README.md" && git commit -am "Update README to test ${destination_storage_name}" && git push origin main && popd-
Verify that the changes were persisted as expected:
rm -rf "/tmp/${project_name}" git clone "${ssh_url_to_repo}" "/tmp/${project_name}" grep 'test' "/tmp/${project_name}/README.md"-
Once all tests have been completed, delete the test project
curl --silent --show-error --request DELETE "https://gitlab.com/api/v4/projects/${project_id}" --header "Private-Token: ${GITLAB_GPRD_ADMIN_API_PRIVATE_TOKEN}" -
Enable the new nodes in Gitlab
Enabling new nodes in the GitLab admin console requires using an admin account to change where new projects are stored. In Admin Area > Settings > Repository > Repository storage > Expand, you will see a list of storage nodes. The ones that are checked are the ones that will receive new projects. For more information see gitlab docs.
-
execution commands for the step
-
Open a private browser window or tab and navigate to: https://gitlab.com/admin/application_settings/repository -
Click the Expandbutton next toRepository storage. -
Click play on the Production gitaly-shard-weights-assigner job to assign a weight. -
Don't forget to click the Save changesbutton.
-
-
post-execution validation for the step
-
Take a count of how many projects are being created on the new shards:
export node='file-69-stor-gprd.c.gitlab-production.internal' bundle exec knife ssh "fqdn:$node" "sudo find /var/opt/gitlab/git-data/repositories/@hashed -mindepth 2 -maxdepth 3 -name *.git | wc -l" export node='file-70-stor-gprd.c.gitlab-production.internal' bundle exec knife ssh "fqdn:$node" "sudo find /var/opt/gitlab/git-data/repositories/@hashed -mindepth 2 -maxdepth 3 -name *.git | wc -l"-
Observe that this number goes up over time.
-