Measure job durations in deployment pipeline
Summary
We would like to measure the durations of jobs in the deployment pipeline and its downstream pipelines.
Questions to be answered:
- Has the duration of any job/pipeline been increasing slowly over time?
- Which job/pipeline takes the most time?
- Has a job/pipeline had any outliers with increased/decreased duration?
- What is the P50, P80, P95, P99 of a job/pipeline duration.
Proposal
Add a gauge named deployment_job_duration_seconds
with the following labels:
-
job_name
: Contains the job name. -
version
: Contains theDEPLOY_VERSION
value. -
project
: Possible values can berelease-tools
,deployer
,qa-staging
,qa-staging-canary
,k8s-workloads
,omnibus
,cng
Implemented
Example metric without labels added by scrapers:
delivery_deployment_job_duration_seconds{deploy_version="15.11.202304201120-040246fea8b.0c2f103500c", job_id="9831498", job_name="notify_start:gprd-cny", job_status="success", project_name="gitlab-org/release/tools", short_job_name="notify_start:", target_env="gprd", target_stage="cny"}
Edited by Reuben Pereira