Geo: Add rescue when disabling maintenance mode after a Geo failover for cloud native deployments

What does this MR do?

This change improves the reliability of setting maintenance mode in GitLab's cloud-native environments. Previously, if the system tried to execute a command on a specific pod (a container) and that pod wasn't available, the operation would simply fail. Now, the code includes error handling that automatically waits for a new pod to become available and retries the operation, making the maintenance mode feature more robust when pods are being restarted or replaced.

#1192 (closed)

Author's checklist

When ready for review, the Author applies the workflowready for review label and mention @gitlab-org/software-delivery/get-maintainers:

  • Merge request:
    • Corresponding Issue raised and reviewed by the GET maintainers team.
    • Merge Request Title and Description are up-to-date, accurate, and descriptive
    • MR targeting the appropriate branch
    • MR has a green pipeline
    • MR has no new security alerts in the widget from the Secret Detection and IaC Scan (SAST) jobs.
  • Code:
    • Check the area changed works as expected across all expected permutations.
    • Check that the changes work across upgrades.
    • Documentation created/updated in the same MR if applicable

Merge request reports

Loading