Implement a Rollbacks dashboard in Grafana
Summary
Implement a rollback information dashboard to help with decision-making and to build confidence in the rollback process and tools. The goal of this dashboard is:
- To improve visibility into the rollbacks that we perform.
- To increase confidence in our ability to perform rollbacks.
Proposal
Introduce a new dashboard in dashboards/delivery called rollback_info.dashboard.jsonnet.
Improve visibility into 2 pieces of information:
-
Number of rollbacks performed in gstg and gprd.
This can be used to show how often we perform rollbacks, and can also be used to ensure that we are performing rollbacks at a regular cadence.
Metric required (already added):
delivery_deployment_rollbacks_started_total -
Number of deployed packages that could have been rolled back.
This will be useful to see how often rollback of a package is possible, to increase confidence that when required, packages are likely to be "rollbackable".
Metric required (already added):
delivery_deployment_can_rollback_total
Implementation
See #2022 (comment 707546678) for an idea of what the dashboard will look like.
-
Iteration 1 - gitlab-com/runbooks!4004 (merged)
Create a dashboard with the 1st row of the image in #2022 (comment 707546678). The default time range for the dashboard can be 1 month.
-
A panel displaying a single-stat of the percentage of deployments that could have been rolledback in gstg and gprd over the selected time period.
sum( increase( delivery_deployment_can_rollback_total{target_env="%(env)s"}[$__range] ) ) / sum( increase( delivery_deployment_started_total{target_env="%(env)s"}[$__range] ) ) -
A panel displaying a graph of the percentage of deployments to gstg and gprd that could have been rolled back over the selected time range.
sum( increase( delivery_deployment_can_rollback_total{target_env=~"%(env)s"}[1d] ) ) by (target_env) / sum( increase( delivery_deployment_started_total{target_env=~"%(env)s"}[1d] ) ) by (target_env)
-
-
Iteration 2 - gitlab-com/runbooks!4011 (merged)
-
2 panels displaying a single-stat of the number of rollbacks performed in each environment (gstg, gprd).
sum( increase( delivery_deployment_rollbacks_started_total{target_env="%(env)s"}[$__range] ) ) -
A panel displaying a graph of the rollbacks performed in all environments (gstg, gprd) over the selected time range.
sum( increase( delivery_deployment_rollbacks_started_total{target_env=~"%(env)s"}[$__range] ) ) by (target_env)
-

