Skip to content

improving the change_zonal_recovery template

Maina Ng'ang'a requested to merge improve-template/change-zonal-recovery into master

What

After executing the official gameday in gstg the following improvements were proposed:

  • Clarify what to do when there is a deployment in-progress.
  • Include "tail-serial-port-output" under troubleshooting tips.
  • Number the tasks.
  • Add sudo command sudo systemctl enable patroni && sudo systemctl start patroni.
  • Add Thanos query to validate the storage used on the /var/opt/gitlab mount point on the old and new servers.
  • Restore weights as part of cleanup.
  • Add time estimates for tasks.
  • Instead of reconfiguring the regional cluster, we might consider presenting options like
    • do nothing
    • cordon the nodes
    • optionally reconfigure the cluster (like we do now)
  • Add a cleanup to "reset" Staging to its original state for Gitaly storages using snapshot restores.

Related Issue:https://gitlab.com/gitlab-com/gl-infra/reliability/-/issues/24212

Why

Edited by Maina Ng'ang'a

Merge request reports