Measure the number of manual job retries per pipeline
Problem
We want to measure the number of jobs that release managers retry per deployment pipeline.
We also want to measure the number of retries in downstream pipelines, especially QA pipelines.
If possible, we could also keep track of which job is being retried, so that we can see which jobs are retried the most.
Proposal
- Since Delivery metrics will receive webhook events from projects that have a lot of different pipelines, we need to filter out and use only events from pipelines triggered by deployment pipelines.
- Check if webhook event is a retry and increment metric if it is.
- Follow the standard naming convention when naming metrics. https://prometheus.io/docs/practices/naming/#metric-names
- Decide what attributes to add to the metric. Do we want to add job name?
Exit Criteria
-
Ability to count the number of job retries -
Ability to use this data on Dashboards
We have a metric called delivery_webhooks_auto_deploy_job_retries
with project
and job_name
labels: Thanos link
Edited by Reuben Pereira