Resize log volumes on all patroni nodes
C4
Production Change - Criticality 4Change Objective | After experiencing an outage due to log volumes filling up, we are retroactively increasing all log volumes on patroni nodes to ensure we have more headroom for similar occurrences in the future |
---|---|
Change Type | ConfigurationChange |
Services Impacted | ~"Service:Postgres" |
Change Team Members | @craig |
Change Severity | C4 |
Change Reviewer or tested in staging | A colleague who will review the change or evidence the change was tested on staging environment |
Dry-run output | If the change is done through a script, it is mandatory to have a dry-run capability in the script, run the change in dry-run mode and output the result |
Due Date | Date and time in UTC timezone for the execution of the change, if possible add the local timezone of the engineer executing the change |
Time tracking | To estimate and record times associated with changes ( including a possible rollback ) |
Detailed steps for the change
For each patroni node
- Resize log volume in GCP console
- Resize filesystem from the host (
lvresize
) - Update disk size values to match updated infrastructure in terraform and apply
Nodes
- patroni-01
- patroni-02
- patroni-03
- patroni-04
- patroni-05
- patroni-06
- patroni-07
- patroni-08
- patroni-09
- patroni-10
- patroni-11
- patroni-12
Rollback steps
For non-leader nodes
- Remove the node from the patroni cluster to drain traffic
- Taint the terraform resources for the patroni node and log disk
- Re-provision the node from terraform
For leader nodes
- Trigger a failover to a replica node
- Remove the old leader node from the patroni cluster to drain traffic
- Taint the terraform resources for the patroni node and log disk
- Re-provision the node from terraform