[Production] Switchover to new patroni leader and refresh old one

Production Change

Change Summary

After #4721 (closed) we'd have refreshed all our patroni cluster nodes but the leader. In this CR we'll perform a switchover to a new leader in order to refresh the old one

Change Details

Services Impacted - ServicePatroni
Change Technician - @ahmadsherif
Change Criticality - C1
Change Type - changeunscheduled
Change Reviewer - @alejandro
Due Date - 2021-07-10 9:00 UTC
Time tracking - ~1 hour
Downtime Component - up to 5 mins

Detailed steps for the change

Pre-Change Steps - steps to be completed before execution of the change

Estimated Time to Complete (mins) - Estimated Time to Complete in Minutes

Disenable automatic database reindexing via Slack chatops: /chatops run feature set database_reindexing false
Set label changein-progress on this issue
Verify that patroni-v12-02 is still the current leader. If not, adjust the procedure bellow accordingly

Change Steps - steps to take to execute the change

Estimated Time to Complete (mins) - Estimated Time to Complete in Minutes

Post-Change Steps - steps to take to verify the change

Estimated Time to Complete (mins) - Estimated Time to Complete in Minutes

Run while true; do for c in /usr/local/bin/pgb-console*; do sudo $c -c 'SHOW CLIENTS;'; done | grep gitlabhq_production | cut -d '|' -f 2 | awk '{$1=$1};1' | grep -v gitlab-monitor | wc -l; sleep 5; done, the number should be increasing gradually
Re-enable automatic database reindexing via Slack chatops: /chatops run feature set database_reindexing true

Rollback

Rollback steps - steps to be taken in the event of a need to rollback this change

Estimated Time to Complete (mins) - Estimated Time to Complete in Minutes

Scenario 1: Before running `tf apply`:

Add the replica to Rails load-balancing:
- a=("" "-1" "-2"); for i in "${a[@]}"; do consul maint -disable -service=db-replica$i; done
Do the verification step(s) above

Scenario 2: After running `tf apply`:

No viable rollback steps, we have to go forward with the change.

Monitoring

Key metrics to observe

Metric: Metric Name
- Location: Dashboard URL
- What changes to this metric should prompt a rollback: Describe Changes

Summary of infrastructure changes

Does this change introduce new compute instances?
Does this change re-size any existing compute instances?
Does this change introduce any additional usage of tooling like Elastic Search, CDNs, Cloudflare, etc?

Summary of the above

Changes checklist

This issue has a criticality label (e.g. C1, C2, C3, C4) and a change-type label (e.g. changeunscheduled, changescheduled) based on the Change Management Criticalities.
This issue has the change technician as the assignee.
Pre-Change, Change, Post-Change, and Rollback steps and have been filled out and reviewed.
Necessary approvals have been completed based on the Change Management Workflow.
Change has been tested in staging and results noted in a comment on this issue.
A dry-run has been conducted and results noted in a comment on this issue.
SRE on-call has been informed prior to change being rolled out. (In #production channel, mention @sre-oncall and this issue and await their acknowledgement.)
There are currently no active incidents.

Edited Jul 12, 2021 by Ahmad Sherif

[Production] Switchover to new patroni leader and refresh old one

Production Change

Change Summary

Change Details

Detailed steps for the change

Pre-Change Steps - steps to be completed before execution of the change

Change Steps - steps to take to execute the change

Post-Change Steps - steps to take to verify the change

Rollback

Rollback steps - steps to be taken in the event of a need to rollback this change

Scenario 1: Before running tf apply:

Scenario 2: After running tf apply:

Monitoring

Key metrics to observe

Summary of infrastructure changes

Changes checklist

Scenario 1: Before running `tf apply`:

Scenario 2: After running `tf apply`: