Migrate large projects off file-25-stor-gprd to file-01-stor-gprd
What?
Migrate the large projects currently on file-25-stor-gprd.c.gitlab-production.internal
(https://dashboards.gitlab.net/d/W_Pbu9Smk/storage-stats?orgId=1&refresh=30m&fullscreen&panelId=135) to file-48-stor-gprd.c.gitlab-production.internal
(https://dashboards.gitlab.net/d/W_Pbu9Smk/storage-stats?orgId=1&refresh=30m&fullscreen&panelId=158)
Why?
Because git data file server disk usage is high: 80.13%
full.
Since we have four other file storage nodes with more than 50% capacity available, a rebalance instead of additional node creation, seems optimal.
Procedure
-
Production change requires commented manager approval.
/cc @dawsmith
Setup
-
Install the rebalance script on a common system (such as the production console system host).
curl --location --silent --remote-name https://gitlab.com/gitlab-com/runbooks/raw/master/scripts/storage_rebalance.rb
-
Export a personal access token in an environment variable on the console system host.
export GITLAB_ADMIN_API_PRIVATE_TOKEN=CHANGEME
Dry-run
-
Execute a dry-run of the rebalance script to move files from nfs-file25
tonfs-file01
.
Do not give an amount of disk space to migrate (--move-amount N
), so that only one single project will be migrated for the time being.
ruby storage_rebalance.rb --verbose nfs-file25 nfs-file01 --wait=10800 --dry-run=yes
-
Verify that the dry-run execution behaved as expected (no errors).
Don't jump in the deep end
-
Inform the engineer on call -
Execute the rebalance script to move files from nfs-file25
tonfs-file01
.
Do not give an amount of disk space to migrate (--move-amount N
), so that only one single project will be migrated for the time being. Specify that the rebalance script will wait much longer than 10 seconds, to ensure that the script continues to monitor the progress of the project migration until it has completed.
ruby storage_rebalance.rb --verbose nfs-file25 nfs-file01 --wait=10800 --dry-run=no | tee nfs-file25.migration.$(date +%Y-%m-%d_%H%M).log
-
Verify that the execution behaved as expected (no errors).
Wade in
-
Execute the rebalance script to move files from nfs-file25
tonfs-file01
.
Specify an amount of disk space in gigabytes (less than 16000
) to migrate.
ruby storage_rebalance.rb --verbose nfs-file25 nfs-file01 --wait=10800 --dry-run=no --move-amount=600 | tee nfs-file25.migration.$(date +%Y-%m-%d_%H%M).log
-
Verify that the execution behaved as expected (no errors).
Clean up
-
Install the cleanup script on a common system (such as the production console system host).
ssh console-01-sv-gprd.c.gitlab-production.internal
cd /tmp
curl --location --silent --remote-name https://gitlab.com/gitlab-com/runbooks/raw/master/scripts/storage_cleanup.rb
sudo mv /tmp/storage_cleanup.rb /var/opt/gitlab/scripts/
sudo chown -R git:git /var/opt/gitlab/scripts/storage_cleanup.rb
sudo chmod +x /var/opt/gitlab/scripts/storage_cleanup.rb
/var/opt/gitlab/scripts/storage_cleanup.rb --help
-
Perform a dry-run clean-up the moved
projects remaining on the source storage node host system.
/var/opt/gitlab/scripts/storage_cleanup.rb --dry-run=yes --verbose --node=file-XX-stor-gprd.c.gitlab-production.internal
Ensure that there were no errors from the execution of this script.
-
Take a snapshot of the disk for nfs-file25
.
Navigate to https://console.cloud.google.com/compute/disksDetail/zones/us-east1-c/disks/file-XX-stor-gprd-data?project=gitlab-production in a browser. Click the "Create Snapshot" button, and then click the "Create" button. Or the following commands may be used in a shell session on your local workstation:
gcloud auth login
gcloud config set project gitlab-production
gcloud config set compute/region us-east1
gcloud config set compute/zone us-east1-c
gcloud compute disks list | grep file-XX-stor-gprd-data
gcloud compute disks snapshot file-XX-stor-gprd-data
-
Finally clean-up the moved
projects remaining on the source storage node host system.
/var/opt/gitlab/scripts/storage_cleanup.rb --dry-run=no --verbose --node=file-XX-stor-gprd.c.gitlab-production.internal