Add instrumentation to address pipeline creation performance
Follow-up from @grzesiek
's comment:
Take a look at this pipeline creation duration histogram.
We see that for 99 percentile this sometimes take 20 seconds to create a pipeline. This is using
rate
function that calculates an average over 1 minute range vector, and we exclude 1% of the worst cases, so it is possible that this can get much worse.PromQL used here:
histogram_quantile(0.99, sum(rate(gitlab_ci_pipeline_creation_duration_seconds_bucket[1m])) by (le))
This is data from the last 6 hours. This looks much worse in the last 12 hours:
Next steps suggested
The next step I would suggest is instrumenting pipeline creation chain (including fetching and merging includes) with additional histograms that could help to identify possible causes of slowness. The slowness might be related to calculating variables, but might be related to fetching external includes, or to something else. Additional instrumentation will help to uncover the mystery
Proposal
- Add instrumentation for each step of the chain
- Add charts to Verify:PE Grafana Dashboard
- Add more granular instrumentation (e.g.
includes
processing orSeed::Build
) if necessary