Create prometheus silenced alerts for deployment go/no-go
We want to binary encode our perception of the production environment regarding the ability to deploy or not.
Right now we rely on tribal knowledge, and each release manager has a different view on the production environment.
This should be part of our observability infrastructure, and exposed to release-tools
/deployer
as a set of simple Prometheus alerts (i.e. deployment_system_unhealty
, deployment_should_stop
, deployment_should_rollback
).
As a first iteration, we can start with a simple alert (i.e. deployment_system_unhealty
) that will consider the overall status of web
, sidekiq
, and git
.
We want to have this broken down by stage:
- when the Canary stage is unhealthy, then promoting the build will break main stage as well.
- when the main stage is unhealthy, it will not be wise to automatically promote a build (a human should make that decision)