How to Setup PostgreSQL Logical Replication from Physical Standby
This method will avoid using pg_dump and copying all data from the main cluster, instead we can prepare a standby cluster from a backup (wal-g, pgbackrest, etc...), then synchronize the data (delta) using physical replication. After that, we can switch to a logical replication.
With the logical replication, standby cluster will receive all changes from the main cluster, and at the same time we can perform pg_upgrade
(or reindex corrupted indexes
after update OS/glibc
). And only when the new cluster is fully ready, we switch traffic without long downtime, ideally we should talk about downtime for no more than 10 seconds (downtime when switching traffic).
Requirements:
-
PostgreSQL version: 10 or newer (to use the native logical replication)
- to replicate partitioned tables with native logical replication, PostgreSQL 13 (or newer) is required.
- although logical replication support for partitioned tables has been announced since PostgreSQL 13, we can replicate each partition separately for earlier versions (since PostgreSQL 10). See test - #34 (comment 1086837095)
- to replicate partitioned tables with native logical replication, PostgreSQL 13 (or newer) is required.
-
Parameters:
wal_level
setting tological
on the source cluster. -
Permissions: In order to be able to copy the initial table data, the role used for the replication connection must have the
SELECT
privilege on a published table andUSAGE
privilege for schema. To create a publication that publishes all tables automatically, the user must be a superuser. https://www.postgresql.org/docs/current/logical-replication-security.html
Steps:
-
create a replica (next, "target") using physical replication (with
pg_basebackup
, or (recommended) from the backupwal-g
/pgbackrest
) -
on the primary (next, "source"), wait for replication to complete, and make sure there is no high replication lag
Click here for more details...
select pid, client_addr, application_name, pg_size_pretty(pg_wal_lsn_diff (pg_current_wal_lsn(),sent_lsn)) as pending, pg_size_pretty(pg_wal_lsn_diff (sent_lsn,write_lsn)) as write, pg_size_pretty(pg_wal_lsn_diff (write_lsn,flush_lsn)) as flush, pg_size_pretty(pg_wal_lsn_diff (flush_lsn,replay_lsn)) as replay, pg_size_pretty(pg_wal_lsn_diff (pg_current_wal_lsn(),replay_lsn)) as total_lag from pg_stat_replication;
-
on target: stop postgres
-
on source: create a publication
Click here for more details...
CREATE PUBLICATION logical_replication FOR ALL TABLES; -- Marks the publication as one that replicates changes for all tables in the database, -- including tables created in the future. -- Doc: https://www.postgresql.org/docs/current/sql-createpublication.html
-
on source: create a slot for logical replication, and save the "lsn"
Click here for more details...
SELECT * FROM pg_create_logical_replication_slot('logical_replication_slot', 'pgoutput'); -- save "lsn" (example '0/3000758')
-
on target: specify recovery taget parameters in the
postgresql.conf
Click here for more details...
recovery_target_lsn = '0/3000758' # lsn that was received when creating the logical replication slot recovery_target_action = 'promote' # promote means the recovery process will finish and the server will start to accept connections.
-
on target: start postgres to reach recovery_target.
After reaching the recover target, the postgres will start to accept connections.
-
on target: create a subscription using a previously created slot for logical replication
Click here for more details...
CREATE SUBSCRIPTION logical_subscription CONNECTION 'host=XXX.XX.XX.XX port=5432 user=replicator password=replicator-pass dbname=test' PUBLICATION logical_replication WITH (copy_data = false, create_slot = false, slot_name = 'logical_replication_slot');
-
on source: wait for replication to complete
Click here for more details...
select pid, client_addr, application_name, pg_size_pretty(pg_wal_lsn_diff (pg_current_wal_lsn(),sent_lsn)) as pending, pg_size_pretty(pg_wal_lsn_diff (sent_lsn,write_lsn)) as write, pg_size_pretty(pg_wal_lsn_diff (write_lsn,flush_lsn)) as flush, pg_size_pretty(pg_wal_lsn_diff (flush_lsn,replay_lsn)) as replay, pg_size_pretty(pg_wal_lsn_diff (pg_current_wal_lsn(),replay_lsn)) as total_lag from pg_stat_replication where application_name = 'logical_subscription';
-
additional actions (if necessary):
pg_upgrade
, reindex corrupted indexes, etc. -
make sure there is no replication lag and switch traffic to the target
-
Finish. Remove subscription on target, replication slot and publication on source
Click here for more details...
-- on taget DROP SUBSCRIPTION logical_subscription; -- on source DROP PUBLICATION logical_replication; SELECT pg_drop_replication_slot('logical_replication_slot') from pg_replication_slots where slot_name = 'logical_replication_slot'
It is important to know:
-
DDL:
- The database schema and DDL commands (CREATE,ALTER,DROP) are not replicated.
- Subsequent schema changes would need to be kept in sync manually.
- Change the schema first to subscriber and only then to publisher.
- Try not to schedule a schema changes during the maintenance window when you temporarily configure logical replication (for example, for major upgrade of the PostgreSQL version).
-
New tables:
- After you have added a new table to the publication on the source, execute the command on the target:
ALTER SUBSCRIPTION logical_subscription REFRESH PUBLICATION;
to start replication for new tables.
- After you have added a new table to the publication on the source, execute the command on the target:
-
Primary Key:
- For DELETE and UPDATE operations to be replicated, the “published” table should have a REPLICA IDENTITY, which can be a primary key. This makes it possible for the modified rows to be identified in the subscriber. INSERT operations can be done without a replica identity.
-
Sequence:
- Sequence data is not replicated.
- The data in serial or identity columns backed by sequences will of course be replicated as part of the table, but the sequence itself would still show the start value on the subscriber. If the subscriber is used as a read-only database, then this should typically not be a problem. If, however, some kind of switchover or failover to the subscriber database is intended, then the sequences would need to be updated to the latest values, either by copying the current data from the publisher (perhaps using pg_dump) or by determining a sufficiently high value from the tables themselves.
-
Triggers:
- triggers can be executed on the subscriber, but if you just create a trigger, it will not work.
- in order for the trigger to work, you need to change the table:
ALTER TABLE <table_name> ENABLE REPLICA TRIGGER <trigger_name>;
- we can also make triggers work not only for replication, but also locally (
ENABLE ALWAYS TRIGGER
)
-
TRUNCATE commands:
- Replication of TRUNCATE commands is supported, but some care must be taken when truncating groups of tables connected by foreign keys. When replicating a truncate action, the subscriber will truncate the same group of tables that was truncated on the publisher, either explicitly specified or implicitly collected via CASCADE, minus tables that are not part of the subscription. This will work correctly if all affected tables are part of the same subscription. But if some tables to be truncated on the subscriber have foreign-key links to tables that are not part of the same (or any) subscription, then the application of the truncate action on the subscriber will fail.
-
Conflicts:
- PostgreSQL currently has limited support for resolving conflicts when the data between servers diverges. If there is a conflict, the replication will stop and PostgreSQL will wait until the issue is manually fixed by the database administrator.
-
Publisher (source) server restarts:
- The current position of each slot is persisted only at checkpoint, so in the case of a crash the slot may return to an earlier LSN, which will then cause recent changes to be sent again when the server restarts.
- Logical decoding clients are responsible for avoiding ill effects from handling the same message more than once.
-
Failover of the logical replication slot
- The problem, in nutshell, is: that the replication slot will be always maintained on the Primary node. If there is a switchover/failover to promote one of the standby, the new primary won’t have any idea about the replication slot maintained by the previous primary node. This breaks the logical replication from the downstream systems or if a new slot is created, it becomes unsafe to use.
- Patroni (since version 2.1.0) provided a working solution without any invasive methods/extensions. Now we can add a permanent logical replication slot to PostgreSQL which will be maintained. Patroni copies the replication slot information from the primary to all eligible standby nodes. And these slots will be preserved during switchover/failover. Details: https://www.percona.com/blog/how-patroni-addresses-the-problem-of-the-logical-replication-slot-failover-in-a-postgresql-cluster/
-
Risks:
- If there is a high replication lag, there is a risk of disk space overflow due to the accumulation of WAL files.
- Recommendation: monitoring the replication lag and disk space usage.
- Mitigation actions: parameter max_slot_wal_keep_size (added to PostgreSQL 13)
- CPU utilization. The risk is minimal, it is possible to saturate one CPU core with the
walsender
process.- Recommendation: monitoring CPU utilization, including monitoring of a separate CPU core.
- Mitigation actions:
- Remove subscription on target, replication slot and publication on the production database
- Memory and Disk I/O utilization.
- PostgreSQL used to keep only 4096 changes (max_changes_in_memory) for each transaction in memory.
- If there is a very lengthy transaction, the rest of the changes will be spilled to disk as spill files.
- Details: here
- Recommendation: monitoring Memory and Disk I/O.
- Mitigation actions:
- parameter logical_decoding_work_mem (added to PostgreSQL 13)
- Remove subscription on target, replication slot and publication on the production database
- If there is a high replication lag, there is a risk of disk space overflow due to the accumulation of WAL files.