improving the change_zonal_recovery template
What and why
After executing gameday in gprd some improvements in the template are necessary they include:
- Adding example MRs for
gprd
- Adding some note on key items to watchout for that may lead to incidents e.g qa-failures if left unchecked can lead to autodeploy failures
- In
gprd
the labelstype
,env
andmain
are used by ansible for deployments once the MR to provision new Gitaly VMs is merged the VMs are added to the deployment inventory cache, this cache is cleared every hour. If the VMs are deleted and a deployment happens before the cache is cleared, deployment jobs will fail. We added a reminder to check if the labels are necessary i.e if we will destroy the VMs we should override the labels or remove them all together.
Related Issue: https://gitlab.com/gitlab-com/gl-infra/production-engineering/-/issues/25072