PostgreSQL fails to start during Geo recovery due to leftover replication slots

GET version: 3.8.0
Cloud Provider: GCP/AWS/Azure/Other -- All providers
Environment configuration: Omnibus Geo deployments with PostgreSQL

Summary

PostgreSQL fails to start when recovering an old Primary as a new Secondary, due to leftover replication slots conflicting with the max_replication_slots = 0 setting configured using the gitlab_geo_recovery playbook.

Problem Statement

When recovering a Geo deployment (typically after a failover scenario):

The old Primary is demoted and will become a new Secondary
The recovery playbook sets max_replication_slots = 0 on the new Secondary (since Secondary sites don't need replication slots)
If replication slots from the old Primary configuration still exist in PostgreSQL, PostgreSQL refuses to start with the error:

2025-09-02_06:06:53.58305 FATAL:  too many replication slots active before shutdown

This prevents the recovery process from completing

Current Behavior

Recovery playbook reconfigures PostgreSQL with max_replication_slots = 0
Leftover replication slots from Primary configuration cause PostgreSQL startup failure
Manual intervention required to drop replication slots before recovery can proceed

Expected Behavior

Recovery playbook should automatically drop all replication slots on the demoted Primary before reconfiguring PostgreSQL
PostgreSQL should start successfully with max_replication_slots = 0
Recovery process should complete without manual intervention

Reproduction Steps

Set up a Geo deployment with Primary and Secondary sites
Trigger a failover scenario requiring Primary demotion
Run the Geo recovery playbook to set up the old Primary as a new Secondary
Observe PostgreSQL failing to start on the demoted Primary (new Secondary)

Proposed Solution

Add a task in the recovery playbook to drop all replication slots before the main recovery process reconfigures PostgreSQL. This ensures no conflicts when max_replication_slots is set to 0.

See MR: MR#1754: Add task to drop replication from secondary sites before proceeding with recovery

Internal Ticket: ZD#650634

Is there a risk of dropping slots that might be needed during the recovery transition period?