2021-09-17: Set redis-01 to have a higher failover priority

Production Change

Change Summary

We want to give redis-01 a higher failover priority as we've recently done some maintenance on this node, giving it more memory.

Change Details

Services Impacted - ServiceRedis
Change Technician - @igorwwwwwwwwwwwwwwwwwwww
Change Reviewer - @jarv
Time tracking - 5m
Downtime Component - none

Detailed steps for the change

Pre-Change Steps - steps to be completed before execution of the change

Estimated Time to Complete (mins) - 1m

Set label changein-progress on this issue

Get current replica-priority values

export redis_cli='REDISCLI_AUTH="$(sudo grep -m1 ^requirepass /var/opt/gitlab/redis/redis.conf|cut -d" " -f2|tr -d \")" /opt/gitlab/embedded/bin/redis-cli'

parallel -j1 --tag 'ssh redis-{}-db-gprd.c.gitlab-production.internal "$redis_cli config get replica-priority"' ::: 01 02 03

Change Steps - steps to take to execute the change

Estimated Time to Complete (mins) - 1m

Set replica-priority on redis-01 to 200

ssh redis-01-db-gprd.c.gitlab-production.internal "$redis_cli config set replica-priority 200"

Post-Change Steps - steps to take to verify the change

Estimated Time to Complete (mins) - 1m

Get current replica-priority values

parallel -j1 --tag 'ssh redis-{}-db-gprd.c.gitlab-production.internal "$redis_cli config get replica-priority"' ::: 01 02 03

Rollback

Rollback steps - steps to be taken in the event of a need to rollback this change

Estimated Time to Complete (mins) - 1m

Revert replica-priority on redis-01 back to 0 (avoid failing over to this node)

ssh redis-01-db-gprd.c.gitlab-production.internal "$redis_cli config set replica-priority 0"

Monitoring

Key metrics to observe

Metric: Redis SLOs
- Location: https://dashboards.gitlab.net/d/redis-main/redis-overview?orgId=1&from=now-1h&to=now
- What changes to this metric should prompt a rollback: Significant increases in latency, error rates, saturation.

Summary of infrastructure changes

Does this change introduce new compute instances?
Does this change re-size any existing compute instances?
Does this change introduce any additional usage of tooling like Elastic Search, CDNs, Cloudflare, etc?

Summary of the above

Changes checklist

Edited Sep 17, 2021 by Igor