2021-09-17: Set redis-01 to have a higher failover priority

Production Change

Change Summary

We want to give redis-01 a higher failover priority as we've recently done some maintenance on this node, giving it more memory.

See also: https://gitlab.com/gitlab-com/gl-infra/production/-/issues/5547#note_680354939.

Change Details

  1. Services Impacted - ServiceRedis
  2. Change Technician - @igorwwwwwwwwwwwwwwwwwwww
  3. Change Reviewer - @jarv
  4. Time tracking - 5m
  5. Downtime Component - none

Detailed steps for the change

Pre-Change Steps - steps to be completed before execution of the change

Estimated Time to Complete (mins) - 1m

  • Set label changein-progress on this issue
  • Get current replica-priority values
    export redis_cli='REDISCLI_AUTH="$(sudo grep -m1 ^requirepass /var/opt/gitlab/redis/redis.conf|cut -d" " -f2|tr -d \")" /opt/gitlab/embedded/bin/redis-cli'
    
    parallel -j1 --tag 'ssh redis-{}-db-gprd.c.gitlab-production.internal "$redis_cli config get replica-priority"' ::: 01 02 03

Change Steps - steps to take to execute the change

Estimated Time to Complete (mins) - 1m

  • Set replica-priority on redis-01 to 200
    ssh redis-01-db-gprd.c.gitlab-production.internal "$redis_cli config set replica-priority 200"

Post-Change Steps - steps to take to verify the change

Estimated Time to Complete (mins) - 1m

  • Get current replica-priority values
    parallel -j1 --tag 'ssh redis-{}-db-gprd.c.gitlab-production.internal "$redis_cli config get replica-priority"' ::: 01 02 03

Rollback

Rollback steps - steps to be taken in the event of a need to rollback this change

Estimated Time to Complete (mins) - 1m

  • Revert replica-priority on redis-01 back to 0 (avoid failing over to this node)
    ssh redis-01-db-gprd.c.gitlab-production.internal "$redis_cli config set replica-priority 0"

Monitoring

Key metrics to observe

Summary of infrastructure changes

  • Does this change introduce new compute instances?
  • Does this change re-size any existing compute instances?
  • Does this change introduce any additional usage of tooling like Elastic Search, CDNs, Cloudflare, etc?

Summary of the above

Changes checklist

  • This issue has a criticality label (e.g. C1, C2, C3, C4) and a change-type label (e.g. changeunscheduled, changescheduled) based on the Change Management Criticalities.
  • This issue has the change technician as the assignee.
  • Pre-Change, Change, Post-Change, and Rollback steps and have been filled out and reviewed.
  • This Change Issue is linked to the appropriate Issue and/or Epic
  • Necessary approvals have been completed based on the Change Management Workflow.
  • Change has been tested in staging and results noted in a comment on this issue.
  • A dry-run has been conducted and results noted in a comment on this issue.
  • SRE on-call has been informed prior to change being rolled out. (In #production channel, mention @sre-oncall and this issue and await their acknowledgement.)
  • Release managers have been informed (If needed! Cases include DB change) prior to change being rolled out. (In #production channel, mention @release-managers and this issue and await their acknowledgment.)
  • There are currently no active incidents.
Edited by Igor