Reassess productions checks in the middle of a deployment
As part of #2951 (closed), the production checks sent during the deployment were accidentally removed https://ops.gitlab.net/gitlab-com/gl-infra/deploy-tooling/-/merge_requests/494.
Those checks sent Slack messages to the #f_upcoming_release
channel to notify release managers about the progress of the deployment and the current health state. The code that sends those messages is still available release-tools
- https://gitlab.com/gitlab-org/release-tools/-/blob/master/lib/release_tools/promotion/deployment_check_foreword.rb#L26
- https://gitlab.com/gitlab-org/release-tools/-/blob/master/lib/tasks/auto_deploy.rake#L40
The messages were created by triggering a pipeline on deployer (example) that calls release-tools and sends a Slack message https://ops.gitlab.net/gitlab-org/release/tools/-/jobs/13675072
The utility of those messages is uncertain, they provided deployment insight but they were not actionable. The purpose of this issue is to discuss the next steps:
- Option A: The messages are restored
- Option B: The code on release-tools is deleted
- Option C: The messages are restored and improved
- Option D:?
Previous content
Click to expand
During a deployment in progress, release-tools send notifications about the deployment progress. If an incident is created during the deployment, the notifications contain the failed production checks:
So far, this has been an informative notification: release managers are not pinged about it and the deployment is not stopped. We don't have a process that dictates what actions should we take from this announcement. Lets use this issue to discuss what could be the actions we should take, some examples: