Identify intermediate points in the deployment pipeline to check production monitoring status
With the introduction of #1092 (closed) we want to verify the status of production not only at the beginning of the deployment.
This issue is to identify other relevant points in the pipeline where we can test again and eventually stop the deployment.
The idea is that when we stop an ongoing deployment we make sure that release managers are notified (maybe also the EOC?) and that the failure notification provides the complete information on how to recover (rollback instructions, or how to move on with the deployment)
Implementation details
Every GitLab deployment job can trigger a production check after the current hosts' batch is completed.
This feature is activated by the presence of the TRIGGER_INTERMEDIATE_PRODUCTION_CHECKS
variable in the deployer pipeline.
By default it's a fire-and-forget trigger, no action will be taken in case of failure, however, setting WAIT_PRODUCTION_CHECKS_TRIGGER
to true
will make the ansible playbook wait for the result and fail in case of a pipeline failure.