Skip to content

Tag projects to be migrated to HDDs based on Gitaly activity

Production Change

Change Summary

On https://gitlab.com/gitlab-com/gl-infra/infrastructure/-/issues/11811 @andrewn developed a way to determine which projects have been accessed on the last 6 months by feeding the Gitaly logs into BigQuery. Now that we have that data we need to tag all projects to be migrated with a custom attribute so we can later feed those projects in our storage migration script.

Change Details

  1. Services Impacted - ServicePatroni
  2. Change Technician - @alejandro
  3. Change Criticality - C2
  4. Change Type - changescheduled
  5. Change Reviewer - @abrandl
  6. Due Date - 20201-01-27 20:00 UTC
  7. Time tracking - 1h
  8. Downtime Component - 0

Detailed steps for the change

Pre-Change Steps - steps to be completed before execution of the change

Estimated Time to Complete (mins) - 20

  • Before running any commands in the tasks of this procedure, specify the environment that you will be targeting by invoking the following command for production:
export GITLAB_ENVIRONMENT='gprd'
  • Get the CSV with the active project ids from the bucket: gsutil cp gs://gitlab-gprd-tmp/inactive_repos.csv.gz inactive_repos.csv.gz
  • Determine which patroni node is the leader and export the fqdn of the leader patroni node as a variable in a shell session on your workstation system:
export node=$(knife ssh "fqdn:patroni-*-db-${GITLAB_ENVIRONMENT}*" 'test -e /usr/bin/jq && sudo /usr/local/bin/gitlab-patronictl list --format json 2>/dev/null | jq -r ".[] | select(.Member==\"$(hostname --fqdn)\").Role"' | grep 'Leader' | cut -d' ' -f1)
echo "${node}"
  • Upload the CSV into the patroni leader: scp inactive_repos.csv.gz ${node}:/tmp/inactive_repos.csv.gz

Change Steps - steps to take to execute the change

Estimated Time to Complete (mins) - 30

  • SSH into the leader and run the following command in a tmux session:
sudo gitlab-psql <<EOF
CREATE TEMPORARY TABLE gitaly_attributes (created_at timestamp with time zone, update_at timestamp with time zone, project_id integer, key varchar, value varchar);

\copy gitaly_attributes FROM PROGRAM 'zcat inactive_repos.csv.gz' WITH HEADER CSV

INSERT INTO project_custom_attributes (created_at, updated_at, project_id, KEY, value)
SELECT ga.* FROM gitaly_attributes ga
JOIN projects ON projects.id=ga.project_id;
EOF

Post-Change Steps - steps to take to verify the change

Estimated Time to Complete (mins) - 10

  • Verify the number of records created
sudo gitlab-psql <<EOF
SELECT COUNT(*) FROM project_custom_attributes WHERE key = 'hdd_migration' AND value = 'pending'
EOF

Rollback

Rollback steps - steps to be taken in the event of a need to rollback this change

Estimated Time to Complete (mins) - 5

  • Delete the records created:
sudo gitlab-psql <<EOF
DELETE FROM project_custom_attributes WHERE key = 'hdd_migration' AND value = 'pending'
EOF

Given that rollback entails running a DELETE command on the patroni leader we should avoid doing so as much as possible. If the procedure gets interrupted at any stage, there should be no side effects to leaving whatever persisted rows remain.

Monitoring

Key metrics to observe

Summary of infrastructure changes

  • Does this change introduce new compute instances? No
  • Does this change re-size any existing compute instances? No
  • Does this change introduce any additional usage of tooling like Elastic Search, CDNs, Cloudflare, etc? No

Summary of the above

Changes checklist

  • This issue has a criticality label (e.g. C1, C2, C3, C4) and a change-type label (e.g. changeunscheduled, changescheduled) based on the Change Management Criticalities.
  • This issue has the change technician as the assignee.
  • Pre-Change, Change, Post-Change, and Rollback steps and have been filled out and reviewed.
  • Necessary approvals have been completed based on the Change Management Workflow.
  • Change has been tested in staging and results noted in a comment on this issue.
  • A dry-run has been conducted and results noted in a comment on this issue.
  • SRE on-call has been informed prior to change being rolled out. (In #production channel, mention @sre-oncall and this issue and await their acknowledgement.)
  • There are currently no active incidents.

Closes https://gitlab.com/gitlab-com/gl-infra/infrastructure/-/issues/12162

Edited by Alejandro Rodríguez