Annotate post-deployment migrations in Grafana
Context
When dealing with incidents associated with migrations, there's no straightforward way to know when a post-deploy migration started or ended, this information could act as a tool for EOCs and engineers to detect root causes and define a mitigation strategy.
For context, post-deploy migrations are executed through the post-deploy migration (PDM) pipeline, this one is independent of the auto-deploys and executed manually by release managers at least once a day.
Original context
When production#1433 (closed) occurred today, we had no idea that a migration that affected 9 million rows was starting until we came across the data by poking around in PostgreSQL logs and stumbled on the information.
I think it would have been helpful to know when regular and post-deployment migrations started and ended. It is probably even more important to annotate the Grafana graphs as well with this information.
Proposal
Similar to how we correlate deployments in Grafana (example), let's do the same for post-deploy migrations.
- Annotate the start and of each migration execution.
- The annotation should include the name of migration and the environment affected (gstg or gprd)