Draft: add pg-cluster-switchover playbook
issue: https://gitlab.com/gitlab-com/gl-infra/reliability/-/issues/16360
This playbook is designed to switchover the traffic to a new Patroni Cluster (logical replication mode).
-
Switchover read-only traffic (replicas) without downtime - done -
Switchover write traffic (leader) with minimum downtime - in progress
playbook: switchover.yml
play #1 (source_cluster:target_cluster): Switchover the traffic to a new Patroni Cluster (logical replication mode) TAGS: []
tasks:
Set the variable: source_pgbouncer_consul_service_name TAGS: [replica]
Set the variable: source_patroni_consul_service_name TAGS: [leader]
Get Patroni Cluster Leader Node TAGS: [leader, replica]
Set the variable: patroni_cluster_leader: true TAGS: [leader, replica]
Set the variable: patroni_cluster_leader: false TAGS: [leader, replica]
(SOURCE) Patroni Cluster Leader TAGS: [leader, replica]
(TARGET) Patroni Cluster Leader TAGS: [leader, replica]
(SOURCE) Consul Service Name TAGS: [leader, replica]
[Pre-Check] (SOURCE) Make sure that logical replication is active TAGS: [leader, replica]
(SOURCE) Print logical replication state TAGS: [leader, replica]
(SOURCE) Print logical replication state TAGS: [leader, replica]
[Pre-Check] (SOURCE) Make sure there is no high logical replication lag TAGS: [leader, replica]
(SOURCE) Print logical replication lag TAGS: [leader, replica]
Start read-only traffic to replicas of the new cluster? TAGS: [replica]
Set the variable: ask_result TAGS: [replica]
(TARGET) Start read-only traffic to replicas of the new cluster: rename the Consul service TAGS: [replica]
(TARGET) Reload consul.service TAGS: [replica]
Get list of replica service nodes TAGS: [replica]
Print list of replica service nodes TAGS: [replica]
Stop read-only traffic to replicas of the old cluster? TAGS: [replica]
Set the variable: ask_result TAGS: [replica]
(SOURCE) Stop read-only traffic to replicas of the old cluster: rename the Consul service TAGS: [replica]
(TARGET) Reload consul.service TAGS: [replica]
Get list of replica service nodes TAGS: [replica]
Print list of replica service nodes TAGS: [replica]
playbook: switchover_rollback.yml
play #1 (source_cluster:target_cluster): Rollback the traffic to a old Patroni Cluster (logical replication mode) TAGS: []
tasks:
Get list of replica service nodes TAGS: [replica]
Print list of replica service nodes TAGS: [replica]
Rollback traffic to the old cluster? TAGS: [replica]
Set the variable: ask_result TAGS: [replica]
(SOURCE) Start read-only traffic to replicas of the old cluster: rename the Consul service TAGS: [replica]
(TARGET) Reload consul.service TAGS: [replica]
(TARGET) Stop read-only traffic to replicas of the new cluster: rename the Consul service TAGS: [replica]
(TARGET) Reload consul.service TAGS: [replica]
Get list of replica service nodes TAGS: [replica]
Print list of replica service nodes TAGS: [replica]
Requirements:
Patroni Standby Cluster in logical replication mode. See physical-to-logical
Synthetic tests:
postgres-ai/postgresql-consulting/tests-and-benchmarks#34 (comment 1229286594)
Edited by Vitaliy Kukharik