For secondary Postgres clusters (physical replication only), use restore_command with explicit --walg-gs-prefix
Our gprd-main test last Sunday was blocked by large (1d+) lag started when switchover happened in the source cluster on Friday. The old primary, patroni-main-2004-04, became physical standby, it was working well, with lag ~0, but our "new" cluster could not follow.
For such cases (failover, switchover), we could have more resilient replication if we'd also have restore_command configured on the standby leader, while it's working using physical replication. But the problem is, this is a completely different cluster, with its own separate WAL-G backup location, and it would not be wise to use the original cluster's location – it would imply risks that we don't want (e.g., two clusters write WALs to the same directory)
@vitabaks came up with a good idea to use explicit --walg-gs-prefix in restore_command, pointing to the original cluster's backup directory, so if streaming replication cannot continue, the standby leader still is able to replay WALs and catch up – it would be with slightly increased lag, but still low not to block the future processes (such as logical replication test in gprd-main, or PG upgrade – or any other process that involves a secondary cluster).
cc @alexander-sosna @bshah11 @anganga @rhenchen.gitlab
If we implement this approach for pg-upgrade-logical, then we should remove this special value from restore_command when we switch to logical replication. (cc @vitabaks)