Skip to content

Create prometheus silenced alerts for deployment go/no-go

We want to binary encode our perception of the production environment regarding the ability to deploy or not.

Right now we rely on tribal knowledge, and each release manager has a different view on the production environment.

This should be part of our observability infrastructure, and exposed to release-tools/deployer as a set of simple Prometheus alerts (i.e. deployment_system_unhealty, deployment_should_stop, deployment_should_rollback).

As a first iteration, we can start with a simple alert (i.e. deployment_system_unhealty) that will consider the overall status of web, sidekiq, and git.

We want to have this broken down by stage:

  • when the Canary stage is unhealthy, then promoting the build will break main stage as well.
  • when the main stage is unhealthy, it will not be wise to automatically promote a build (a human should make that decision)
Edited by Alessio Caiazza