Skip to content

Create new gitaly storage shard node `file-51-stor-gprd` to replace `file-41-stor-gprd` in the configured rotation for storing new projects

Production Change - Criticality 3 C3

Change Objective Increase capacity for new project repository storage
Change Type Add additional infrastructure instances
Services Impacted Gitaly
Change Team Members Username of the engineers involved in the change
Change Severity ~C3
Buddy check or tested in staging @cindy
Schedule of the change 2020-05-06 17:00 UTC
Duration of the change 1 hour
Detailed steps for the change. Each step must include: See below: Summary
Production change requires commented manager approval: /cc @albertoramos

Meta

  • Replace all occurrences of "XX" with the new gitaly shard node number.
  • Replace all occurrences of "YY" with the old gitaly shard node number that the new one will replace.
  • Set the title of this production change issue to: Create new gitaly storage shard node file-51-stor-gprd to replace file-41-stor-gprd in the configured rotation for storing new projects
  • Add labels by adding a comment with the following command: /label ~Infrastructure ~C3 ~change ~"requires production access" ~"Service::Gitaly"
  • Replace the first line of the Summary section below as directed.
  • Acquire commented manager approval.

Summary

Create new gitaly storage shard node to replace nfs-file41: https://gitlab.com/gitlab-com/gl-infra/infrastructure/-/issues/9824

Detailed steps for the change

The following are the detailed steps for the change.

Note: These steps do not apply to Praefect systems.

Build the new VM instance

  • pre-conditions for execution of the step
  • execution commands for the step
    • Optionally, check quotas before applying the terraform changes. You can check with:
    gcloud --project='gitlab-production' compute regions describe us-east1 --format=json | jq -c '.quotas[] | select(.limit > 0) | select(.usage / .limit > 0.5) | { metric, limit, usage }'
    • Merge the MR.
    • Notify the Engineer On-call about the planned change.
    • Click the apply-to-prod pipeline stage play button.
  • post-execution validation for the step
    • Examine the gprd apply pipeline stage output and confirm the absence of relevant errors.
  • rollback of the step
    • Revert the MR.

Ensure the creation of the storage directory

Once the gitaly node is created, it will take a few minutes for chef to run on the system, so it may not be immediately available.

  • pre-conditions for execution of the step
    • Make sure chef-client runs without any errors.
    export node='file-51-stor-gprd.c.gitlab-production.internal'
    bundle exec knife ssh "fqdn:$node" "sudo grep 'Chef Client finished' /var/log/syslog | tail -n 1"
  • execution commands for the step
    • If chef does not converge after 5 minutes or so, then invoke it manually. If chef refuses to run, then something is wrong, and this procedure should be rolled-back.
    bundle exec knife ssh "fqdn:$node" "sudo chef-client"
    • Confirm storage directory /var/opt/gitlab/git-data/repositories exists on the file system of the new node.
    bundle exec knife ssh "fqdn:$node" "sudo df -hT /var/opt/gitlab/git-data/repositories && sudo ls -la /var/opt/gitlab/git-data/ && sudo ls -la /var/opt/gitlab/git-data/repositories | head"
  • post-execution validation for the step
    • Confirm that the gitaly service is running
    bundle exec knife ssh "fqdn:$node" "sudo gitlab-ctl status gitaly"
    • Confirm that there are no relevant errors in the logs.
    bundle exec knife ssh "fqdn:$node" "sudo grep -i 'error' /var/log/gitlab/gitaly/current | tail"
  • rollback of the step
    • No rollback procedure for this step is necessary.
    • This step only confirms and verifies steps taken so far.

Tell the GitLab application about the new node

Configure the GitLab application to include the new node. Note: The GitLab application will consider the new node to be disabled by default.

  • pre-conditions for execution of the step
                {
                  "name": "nfs-file51",
                  "path": "/var/opt/gitlab/git-data/repositories"
                },
              "nfs-file51": {
                "path": "/var/opt/gitlab/git-data-file51",
                "gitaly_address": "tcp://file-51-stor-gprd.c.gitlab-production.internal:9999"
              },
    • Have the MR reviewed by a colleague.
  • execution commands for the step
    • Merge the MR.
    • Notify the Engineer On-call about the planned change.
    • Click the Apply_to_prod pipeline stage play button.
    • Examine the pipeline stage output to verify that there were no errors.
  • post-execution validation for the step
    • Force chef-client to run on the relevant nodes:
    bundle exec knife ssh -C 3 "roles:gprd-base-stor-gitaly" "sudo chef-client"
    • Optionally, in another shell session, also force chef-client to run on the relevant nodes. Or else just wait for the nodes to converge naturally.
    bundle exec knife ssh -C 3 "roles:gprd-base-fe OR roles:gprd-base-be" "sudo chef-client"
    • Optionally have chef check for the change:
    $ bundle exec knife role show gprd-base-stor-gitaly | grep -A1 'nfs-file51'
            name: nfs-file51
            path: /var/opt/gitlab/git-data/repositories
  • rollback of the step
    • Revert the MR.
    • Click the Apply_to_prod pipeline stage play button.
    • Re-run the commands in the post-execution validation for the step

Roll out the new configurations

Get these configurations to the front-end and back-end systems in the fleet.

  • pre-conditions for execution of the step
    • Check the gitaly logs for the absence of any relevant errors in order to confirm the gitaly service is operating normally.
    export node='file-51-stor-gprd.c.gitlab-production.internal'
    bundle exec knife ssh "fqdn:$node" "sudo grep -i 'error' /var/log/gitlab/gitaly/current | tail | jq"
  • execution commands for the step
    • In a new shell session, run chef-client on front-end systems, gradually:
    bundle exec knife ssh -C 3 "roles:gprd-base-fe" "sudo chef-client"
    • In a new shell session, run chef-client on back-end systems, gradually:
    bundle exec knife ssh -C 3 "roles:gprd-base-be" "sudo chef-client"
  • post-execution validation for the step
    • Optionally, in yet another separate shell session, run these chef commands to list every system that still remains without the expected configuration entry:
    export command="sudo grep --files-without-match 'nfs-file51' /etc/gitlab/gitlab.rb"
    bundle exec knife ssh -C 3 "roles:gprd-base-fe" "$command"
    bundle exec knife ssh -C 3 "roles:gprd-base-be" "$command"
  • rollback of the step

Test the new node

Confirm that the new storage node is operational.

  • execution commands for the step
    • Create a new project, but do not push any data to it.
    • Copy its Project ID.
    • Use the API to move it to a new storage server before pushing any data to it.
    • Now that the project is moved, push some data to it and ensure that everything works. Namely, be sure that the web interface updates with the data you have pushed.

Enable the new node in Gitlab

Enabling new nodes in the GitLab admin console requires using an admin account to change where new projects are stored. In Admin Area > Settings > Repository > Repository storage > Expand, you will see a list of storage nodes. The ones that are checked are the ones that will receive new projects. For more information see gitlab docs.

  • execution commands for the step
    • Open a private browser window or tab and navigate to: https://gitlab.com/admin/application_settings/repository
    • Click the Expand button next to Repository storage.
    • Click on the checkbox next to nfs-file51 and verify that it is now checked .
    • Don't forget to click the Save changes button.
  • post-execution validation for the step
    • Take a count of how many projects are being created on the new shard:
    export node='file-51-stor-gprd.c.gitlab-production.internal'
    bundle exec knife ssh "fqdn:$node" "sudo find /var/opt/gitlab/git-data/repositories/@hashed -mindepth 2 -maxdepth 3 -name *.git | wc -l"
    • Observe that this number goes up over time.

Disable the old node in Gitlab

Disable the old node from the pool of nodes which are configured to store new projects.

  • execution commands for the step
    • Click on the checkbox next to nfs-file41 and verify that it is now un-checked.
    • Don't forget to click the Save changes button.
  • post-execution validation for the step
    • Take a count of how many projects are being created on the old shard:
    export node='file-41-stor-gprd.c.gitlab-production.internal'
    bundle exec knife ssh "fqdn:$node" "sudo find /var/opt/gitlab/git-data/repositories/@hashed -mindepth 2 -maxdepth 3 -name *.git | wc -l"
    • Observe that this number never goes up over time. (Either goes down or does not change.)
Edited by Nels Nelson