improving the change_zonal_recovery template
What
After executing the official gameday in gstg
the following improvements were proposed:
-
Clarify what to do when there is a deployment in-progress. -
Include "tail-serial-port-output" under troubleshooting tips. -
Number the tasks. -
Add sudo command sudo systemctl enable patroni && sudo systemctl start patroni
. -
Add Thanos query to validate the storage used on the /var/opt/gitlab mount point on the old and new servers. -
Restore weights as part of cleanup. -
Add time estimates for tasks. -
Instead of reconfiguring the regional cluster, we might consider presenting options like - do nothing
- cordon the nodes
- optionally reconfigure the cluster (like we do now)
-
Add a cleanup to "reset" Staging to its original state for Gitaly storages using snapshot restores.
Related Issue:https://gitlab.com/gitlab-com/gl-infra/reliability/-/issues/24212
Why
Edited by Maina Ng'ang'a