Provision 2 gitaly shards to keep up with growth
C3
Production Change - Criticality 3Change Objective | Increase capacity for new project repository storage |
---|---|
Change Type | Add additional infrastructure instances |
Services Impacted | Gitaly |
Change Team Members | @rehab |
Change Severity | ~C3 |
Buddy check or tested in staging | TBD |
Schedule of the change | TBD |
Duration of the change | 2 hours |
Detailed steps for the change. Each step must include: | See below: Summary |
Meta
-
Replace all occurrences of " XX
" with the new gitaly shard node number. -
Replace all occurrences of " YY
" with an existing gitaly shard node number. -
Set the title of this production change issue to: Create new gitaly storage shard node file-XX-stor-gprd
for storing new projects -
Add labels by adding a comment with the following command: /label ~Infrastructure ~C3 ~change ~"requires production access" ~"Service::Gitaly"
-
Replace the first line of the Summary section below as directed.
Summary
- Detailed steps for the change
-
Build the new VM instance -
Ensure the creation of the storage directory -
Tell the GitLab application about the new node -
Roll out the new configurations -
Test the new node -
Enable the new node in Gitlab -
Disable the old node in Gitlab
Detailed steps for the change
The following are the detailed steps for the change.
Note: These steps do not apply to Praefect systems.
Build the new VM instance
-
pre-conditions for execution of the step
-
Create a new MR. - The commit should increment the
"multizone-stor"
variable setting by 1 around line472
of the fileenvironments/gprd/variables.tf
- Here is an example title and description to use for this MR.
- The commit should increment the
-
Using the new value of the multizone-stor
field, change the MR title to: Increment multi-zone storage nodes by 1 to [the new total] -
Link: [Add a link to the MR here] -
Have the MR reviewed by a colleague.
-
-
execution commands for the step
-
Optionally, check quotas before applying the terraform changes. You can check with:
gcloud --project='gitlab-production' compute regions describe us-east1 --format=json | jq -c '.quotas[] | select(.limit > 0) | select(.usage / .limit > 0.5) | { metric, limit, usage }'
-
Merge the MR. -
Notify the Engineer On-call about the planned change. -
Click the apply-to-prod
pipeline stageplay
button.
-
-
post-execution validation for the step
-
Examine the gprd apply
pipeline stage output and confirm the absence of relevant errors.
-
-
rollback of the step
-
Revert the MR.
-
Ensure the creation of the storage directory
Once the gitaly node is created, it will take a few minutes for chef to run on the system, so it may not be immediately available.
-
pre-conditions for execution of the step
-
Make sure chef-client
runs without any errors.
export node='file-XX-stor-gprd.c.gitlab-production.internal' bundle exec knife ssh "fqdn:$node" "sudo grep 'Chef Client finished' /var/log/syslog | tail -n 1"
-
-
execution commands for the step
-
If chef does not converge after 5 minutes or so, then invoke it manually. If chef refuses to run, then something is wrong, and this procedure should be rolled-back.
bundle exec knife ssh "fqdn:$node" "sudo chef-client"
-
Confirm storage directory /var/opt/gitlab/git-data/repositories
exists on the file system of the new node.
bundle exec knife ssh "fqdn:$node" "sudo df -hT /var/opt/gitlab/git-data/repositories && sudo ls -la /var/opt/gitlab/git-data/ && sudo ls -la /var/opt/gitlab/git-data/repositories | head"
-
-
post-execution validation for the step
-
Confirm that the gitaly service is running
bundle exec knife ssh "fqdn:$node" "sudo gitlab-ctl status gitaly"
-
Confirm that there are no relevant errors in the logs.
bundle exec knife ssh "fqdn:$node" "sudo grep -i 'error' /var/log/gitlab/gitaly/current | tail"
-
-
rollback of the step
- No rollback procedure for this step is necessary.
- This step only confirms and verifies steps taken so far.
Configure the GitLab application so that it is aware of the new node
Configure the GitLab application to include the new node. Note: The GitLab application will consider the new node to be disabled by default.
-
pre-conditions for execution of the step
-
Create a new MR in the chef-repo
project.- Here is an example title and description to use for this MR.
- The commit should consist of the following changes:
-
Update the default_attributes.omnibus-gitlab.gitlab_rb.git_data_dirs
map entry of fileroles/gprd-base-stor-gitaly-common.json
, add an entry similar to:
{ "name": "nfs-fileXX", "path": "/var/opt/gitlab/git-data/repositories" },
-
Update the default_attributes.omnibus-gitlab.gitlab_rb.git_data_dirs
map entry of fileroles/gprd-base.json
, add an entry similar to:
"nfs-fileXX": { "path": "/var/opt/gitlab/git-data-fileXX", "gitaly_address": "tcp://file-XX-stor-gprd.c.gitlab-production.internal:9999" },
-
Link: [Add a link to the MR here] -
Have the MR reviewed by a colleague.
-
-
execution commands for the step
-
Merge the MR. -
Notify the Engineer On-call about the planned change. -
Check the Apply_to_prod
ops.gitlab.net pipeline to see if the change successfully applied. -
Examine the pipeline stage output to verify that there were no errors.
-
-
post-execution validation for the step
-
Force chef-client
to run on the relevant nodes:
bundle exec knife ssh -C 3 "roles:gprd-base-stor-gitaly-common" "sudo chef-client"
-
Optionally, in another shell session, also force chef-client
to run on the relevant nodes. Or else just wait for the nodes to converge naturally.
bundle exec knife ssh -C 3 "roles:gprd-base-fe OR roles:gprd-base-be" "sudo chef-client"
-
Optionally have chef check for the change:
$ bundle exec knife role show gprd-base-stor-gitaly-common | grep -A1 'nfs-fileXX' name: nfs-fileXX path: /var/opt/gitlab/git-data/repositories
-
-
rollback of the step
-
Revert the MR. -
Check the Apply_to_prod
ops.gitlab.net pipeline to see if the change successfully applied. -
Re-run the commands in the post-execution validation for the step
-
Add the new Gitaly node to all our Kubernetes container configuration
-
pre-conditions for execution of the step
-
Create a new MR in the gl-infra/k8s-workloads/gitlab-com
project. -
In the MR you want to update the file releases/gitlab/values/${environment}.yaml.gotmpl
and add the new node to theglobal.gitaly.external
yaml list Typically the data looks like
- hostname: gitaly-01-sv-pre.c.gitlab-pre.internal name: default port: "9999" tlsEnabled: false
-
Have the MR reviewed by a colleague in Delivery
-
-
execution commands for the step
-
Merge the MR. -
Examine the pipeline stage output to verify that there were no errors.
-
-
rollback of the step
- Completing the execution tasks for this step will suffice as a roll-back.
Test the new node
Confirm that the new storage node is operational.
-
pre-conditions for execution of the step
-
Export your gitlab.com
user auth token as an environment variable in your shell session.
export GITLAB_COM_API_PRIVATE_TOKEN='CHANGEME'
-
Also export your gitlab.com
admin user auth token as an environment variable in your shell session.
export GITLAB_GPRD_ADMIN_API_PRIVATE_TOKEN='CHANGEME'
-
-
execution commands for the step
-
Create a new project:
export project_name='nfs-fileYY-test' rm -f "/tmp/project-${project_name}.json" curl --silent --show-error --request POST "https://gitlab.com/api/v4/projects?name=${project_name}&default_branch=main" --header "Private-Token: ${GITLAB_COM_API_PRIVATE_TOKEN}" > "/tmp/project-${project_name}.json" export project_id=$(cat "/tmp/project-${project_name}.json" | jq -r '.id') export ssh_url_to_repo=$(cat "/tmp/project-${project_name}.json" | jq -r '.ssh_url_to_repo')
-
Clone the project.
git clone "${ssh_url_to_repo}" "/tmp/${project_name}"
-
Add, commit, and push a README
file to the project repository.
echo "# ${project_name}" > "/tmp/${project_name}/README.md" pushd "/tmp/${project_name}" && git add "/tmp/${project_name}/README.md" && git commit -am "Add README" && git push origin main && popd
-
Use the API to move it to a new storage server:
export destination_storage_name='nfs-fileYY' export move_id=$(curl --silent --show-error --request POST "https://gitlab.com/api/v4/projects/${project_id}/repository_storage_moves" --data "{\"destination_storage_name\": \"${destination_storage_name}\"}" --header "Private-Token: ${GITLAB_GPRD_ADMIN_API_PRIVATE_TOKEN}" --header 'Content-Type: application/json' | jq -r '.id')
-
Optionally poll the api to monitor the state of the move:
curl --silent --show-error "https://gitlab.com/api/v4/projects/${project_id}/repository_storage_moves/${move_id}" --header "Private-Token: ${GITLAB_GPRD_ADMIN_API_PRIVATE_TOKEN}" | jq -r '.state'
-
Optionally confirm the new location:
curl --silent --show-error "https://gitlab.com/api/v4/projects/${project_id}" --header "Private-Token: ${GITLAB_GPRD_ADMIN_API_PRIVATE_TOKEN}" | jq -r '.repository_storage'
-
Once the project has finished being moved to the new shard, proceed to add, commit, and push an update to the README
:
echo -e "\n\ntest" >> "/tmp/${project_name}/README.md" pushd "/tmp/${project_name}" && git add "/tmp/${project_name}/README.md" && git commit -am "Update README to test nfs-fileYY" && git push origin main && popd
-
Verify that the changes were persisted as expected:
rm -rf "/tmp/${project_name}" git clone "${ssh_url_to_repo}" "/tmp/${project_name}" grep 'test' "/tmp/${project_name}/README.md"
-
Enable the new node in Gitlab
Enabling new nodes in the GitLab admin console requires using an admin account to change where new projects are stored. In Admin Area
> Settings
> Repository
> Repository storage
> Expand
, you will see a list of storage nodes. The ones that are checked are the ones that will receive new projects. For more information see gitlab docs.
-
execution commands for the step
-
Open a private browser window or tab and navigate to: https://gitlab.com/admin/application_settings/repository -
Click the Expand
button next toRepository storage
. -
Click play on the Production gitaly-shard-weights-assigner job to assign a weight. -
Don't forget to click the Save changes
button.
-
-
post-execution validation for the step
-
Take a count of how many projects are being created on the new shard:
export node='file-XX-stor-gprd.c.gitlab-production.internal' bundle exec knife ssh "fqdn:$node" "sudo find /var/opt/gitlab/git-data/repositories/@hashed -mindepth 2 -maxdepth 3 -name *.git | wc -l"
-
Observe that this number goes up over time.
-
-
post-execution validation for the step
-
Take a count of how many projects are being created on the old shard:
export node='file-YY-stor-gprd.c.gitlab-production.internal' bundle exec knife ssh "fqdn:$node" "sudo find /var/opt/gitlab/git-data/repositories/@hashed -mindepth 2 -maxdepth 3 -name *.git | wc -l"
-
Observe that this number never goes up over time. (Either goes down or does not change.)
-