Expose a duration histogram of the runner prepare stage

Overview

This feature adds a new Prometheus histogram metric that counts the duration for preparing the CI/CD job environment - the prepare stage

A customer that uses Kubernetes to host the CI/CD build environment (runners) and runs ~200k CI/CD jobs per day have found that the duration of the pod provisioning step (prepare environment) can be > 3 minutes. The estimate is that this impacts ~ 10% of the daily CI/CD jobs. Therefore, this customer needs visibility into the duration trends for the preparation stage to determine adjustments to the compute resources allocated to the Kubernetes cluster(s).

Add a histogram metric that counts the duration for preparing the CI/CD job environment.

Note:

We already partition number of jobs by execution step and executor stage, and that can be in some way extrapolated to time differences of the different pre-defined steps.

Edited Apr 08, 2024 by Darren Eastman