Maintenance mode setting fails during failover
- GET version: 3.5.2
- Cloud Provider: AWS
- Environment configuration: Dedicated (Cloud Hybrid)
When running the failover playbook, maintenance mode consistently fails on the secondary. I believe the reason is that the toolbox pod terminates / refreshes between when the pod name is gathered and when the rails command is attempted to run on that pod. When enabling maintenance mode the failure is ignored, but when attempting to disable it, this is the cause of the consistent failure once the failover is complete.
A retry has been a consistent fix for the playbook, and adding a rescue loop to retry a similar command on the Dedicated post-failover playbook to update configurations has been a robust fix.
We propose adding a rescue block to the Set Maintenance Mode for Cloud Hybrid task that will fetch the latest running toolbox pod name and retry the rails command.