Measure the number of manual job retries in a given timeframe
Problem
After having the number of manual job retries available in our Metrics system. We want to be able to measure the number of manual job retries happening in a given timeframe. This can provide valuable insights on how this is impacting lead time and/or other Release Management actions.
This could be achieved having a chart in Grafana where we can select the timeframes (e.g.: week and month)
Goal: Ability to count the number of job retries in a given timeframe
Proposal
We can create charts in Grafana using the delivery_webhooks_auto_deploy_job_retries
metric. We could add these new charts to a new dashboard called Release manager toil
since these metrics will help evaluate the toil that RMs went through.
-
sum(increase(delivery_webhooks_auto_deploy_job_retries[1d])) by (project)
OR sum(increase(delivery_webhooks_auto_deploy_job_retries[1d])) by (project, job_name)
Edited by Reuben Pereira