Geo: Add rescue when disabling maintenance mode after a Geo failover for cloud native deployments
What does this MR do?
This change improves the reliability of setting maintenance mode in GitLab's cloud-native environments. Previously, if the system tried to execute a command on a specific pod (a container) and that pod wasn't available, the operation would simply fail. Now, the code includes error handling that automatically waits for a new pod to become available and retries the operation, making the maintenance mode feature more robust when pods are being restarted or replaced.
Related issues
Author's checklist
When ready for review, the Author applies the workflowready for review label and mention @gitlab-org/software-delivery/get-maintainers:
- Merge request:
- Corresponding Issue raised and reviewed by the GET maintainers team.
- Merge Request Title and Description are up-to-date, accurate, and descriptive
- MR targeting the appropriate branch
- MR has a green pipeline
-
MR has no new security alerts in the widget from the
Secret DetectionandIaC Scan (SAST)jobs.
- Code:
- Check the area changed works as expected across all expected permutations.
- Check that the changes work across upgrades.
- Documentation created/updated in the same MR if applicable