add pg-upgrade-logical
What's new
- Added a new playbook
upgrade.yml - The "switchover.yml" and "switchover_rollback.yml" playbooks has been moved to the "pg-upgrade-logical" directory (as part of the upgrade procedure).
- Now the playbook "physical_to_logical.yml" is part of "upgrade.yml" (the old playbook has been deleted).
This playbook can be used to upgrade to any version of PostgreSQL (12->13, 12->14, 12->15, or 13->14, 14->15, etc).
-
TODO: Add README.mdwith a description of the plan for each playbook.
Plan
Step 1: PRE-UPGRADE: Perform Pre-Checks
- (TARGET) Make sure that is Standby leader
- (ALL) Test PostgreSQL DB Access
- (TARGET) Check the current version of PostgreSQL
- Stop if, an incorrect version of PostgreSQL may have been specified
- if the
pg_old_versionvariable does not match the current version of PostgreSQL
- if the
- Stop if, an incorrect target version of PostgreSQL may have been specified
- if the
pg_new_versionvariable is equal to or less than the current version of PostgreSQL (no update required)
- if the
- Stop if, an incorrect version of PostgreSQL may have been specified
- (SOURCE) Make sure there are no tables with replica identity "nothing"
- Stop, if replica identity "nothing" is present
- (SOURCE) Make sure that tables with replica identity “default” have primary key
- Stop, if replica identity "default" without PK is present
- (ALL) Make sure that the password file ".pgpass" exists
- Stop, if .pgpass is not exists
- (SOURCE) Make sure that the user is specified in the password file
- Stop, if the user (pg_source_replica_user variable) is not present
- (SOURCE) Get the password for the user from the password file
- (TARGET) Configure the password file for the user
- Test the access to the source database
- Note: User from the pg_source_replica_user variable to database from the pg_source_dbname variable
- (TARGET) Make sure that physical replication is active
- Stop, if there are no active replicas
- if pg_stat_replication returned 0 entries (except pg_basebackup)
- (TARGET) Make sure there is no high physical replication lag
- No more than
max_replication_lag_bytes - Stop, if physical replication lag is high
- No more than
- (TARGET) Rsync Checks
- Create /tmp/testrsync file on Primary
- Test rsync and ssh key access
- Cleanup testrsync file
- (TARGET) Stop Chef client to prevent unexpected changes during the process
Step 2: PRE-UPGRADE: Install new PostgreSQL packages
- Update apt cache
- Make sure that the new PostgreSQL packages are installed (install if missing)
- Make sure that the rsync package are installed
Step 3: PRE-UPGRADE: Initialize new db, schema compatibility check, and pg_upgrade --check
- Make sure new PostgreSQL data directory exists (create directories)
- Get the current encodig and data_checksums settings
- Initialize a new PostgreSQL data directory on the Primary
- with the same encoding
- with enabled checksums (if they are enabled in the current version of PostgreSQL)
- Check the compatibility of the database schema with the new PostgreSQL
- using pg_dumpall --schema-only
- Clear a new PostgreSQL data directory and re-initdb (after checking the scheme)
- Verify the two clusters are compatible
- using pg_upgrade --check (check clusters only, don't change any data)
Step 4: Create a publication not the Source and reach recovery_target_lsn on the Target leader
- Execute CHECKPOINT before stopping PostgreSQL
- Stop Patroni service on the Cluster Replica
- Stop Patroni service on the Cluster Leader
- Check if old PostgreSQL is stopped
- Check if new PostgreSQL is stopped
- additional checks using pg_ctl
- if running, stop PostgreSQL using pg_ctl
- (SOURCE) Execute checkpoint and create a publication for logical replication
- (SOURCE) Create a slot for logical replication, and save 'lsn'
- (TARGET) Specify recovery parameters on the Standby Cluster Leader
- recovery_target_lsn = '{{ pg_slot_lsn }}'
- recovery_target_action = 'promote'
- recovery_target_timeline = 'latest'
- (TARGET) Start PostgreSQL on the Standby Cluster Leader to reach recovery_target_lsn
- (TARGET) Wait until the recovery is complete
- (TARGET) Get the current PostgreSQL log file
- (TARGET) Check the PostgreSQL log file
- Stop, if target LSN not reached
- Print the result of checking the PostgreSQL log
- (TARGET) Execute CHECKPOINT before stopping postgresql
- (TARGET) Stop PostgreSQL after reaching the recovery target
- (TARGET) Remove the Patroni
standby_clusterparameters from the DCS
Step 5: PRE-UPGRADE: Prepare the Patroni configuration
- Edit patroni.yml | update parameters: data_dir, bin_dir, config_dir
- Edit patroni.yml | remove parameters: standby_cluster (if exists)
- Copy the pg_hba.conf file to a new PostgreSQL (to save pg_hba rules)
Step 6: UPGRADE: Upgrade PostgreSQL
- Upgrade the PostgreSQL to new version on the Primary
- using
pg_upgrade --link(hard-links)
- using
- Upgrade the PostgreSQL on the Replica
- using
rsync(hard-links)- Note: before rsync is executed, the owner of the PostgreSQL data directories changes to the user from under whom the playbook is executed, to use ssh agent forwarding. After rsync is executed, the owner of the directory returns to the user defined in the os_pg_user variable (gitlab-psql)
- using
- Remove existing cluster "target-postgres-cluster" from DCS
- Start Patroni service on the Cluster Leader
- Wait for Patroni port to become open on the host
- Check Patroni is healthy on the Leader
- Start Patroni service on the Cluster Replica
- Wait for Patroni port to become open on the host
- Check Patroni is healthy on the Replica
Step 7: POST-UPGRADE: Check Replication and Update extensions
- Make sure that physical replication is active
- Check pg_stat_replication (count records, except pg_basebackup)
- Create a table "test_replication" with 10000 rows on the Primary
- Wait until the PostgreSQL replica is synchronized
- Check pg_stat_replication (count records, except pg_basebackup)
- We wait (no more than 10 minutes) until the replication lag is 0 bytes
- Verify table record on Replicas
- The PostgreSQL Replication is Ok. If the number of records in the test_replication table the same as the Primary.
- Error, if the number of records does not match the Primary.
- Drop a table "test_replication"
- Update old extensions
- Get a list of old PostgreSQL extensions
- Update all old PostgreSQL extensions
- excluding 'pg_repack' as it requires re-creation to update
- Recreate old PostgreSQL pg_repack extension
- if an update is required
Step 8: POST-UPGRADE: Create a subscription for logical replication
- (TARGET) Create a subscription for logical replication using a previously created slot
- (SOURCE) Make sure that logical replication is active
- wait until the logical replication slot is active (max wait time: 1 minute)
- (SOURCE) Check the logical replication lag
- Print the result of setting up logical replication
- example: "The logical replication setup is completed. Current replication lag: XX MB"
Step 9: POST-UPGRADE: Analyze a PostgreSQL database (update optimizer statistics)
- Run vacuumdb to analyze
- for a database defined in the pg_target_dbname variable
- All CPU cores of the server are used
Requirements
- Prepare the Patroni Standby Cluster from a backup or disk snapshot (not part of this playbook).
- Make sure your ssh-agent has your identity loaded (
ssh-add -L), since we use agent forwarding by default (configured in ansible.cfg), to upgrade replicas usingrsync.
Example of a playbook execution
ansible-playbook -i inventory/db-benchmarking-main.yml upgrade.yml -e "pg_old_version=12 pg_new_version=13"
Edited by Vitaliy Kukharik