Add telemetry to capture runner job step timings
Right now we capture how long an entire job takes to complete, but we don't have a lot of visibility on how long each phase of a job takes.
For example:
- How long it takes for a runner to pick up the job
- How long it takes for a runner-manager to start up the job specific runner
- How long does it take for container / VM download
- Cache hit rate (container already pulled)
- How long it takes to do the git clone
- How long to pull artifacts/cache
- How long does the pre/script/post scripts take to execute
- How long to post artifacts/cache
- How long to tear down and recycle
We should also ensure we are capturing the type of executor, to allow us to differentiate the different performance characteristics.
As an MVC, this data could be emitted by Prometheus so it can be easily aggregated and sifted through. As a further iteration, we can also export this into GitLab to be rendered in various ways.
Edited by Joshua Lambert