More-efficient calculation of job variables when building a CI pipeline
Summary
When calling Ci::CreatePipelineService
, N jobs are created, as specified in the .gitlab-ci.yml
file. Each of these jobs has a number of environment variables set, and whether the job should be created or not can sometimes depend on the value of these variables. Many of the variables are identical for every job in the pipeline; a small number vary between jobs in the same pipeline.
I'm not too familiar with the CI areas of the codebase, but to me it looks like we build the list of variables at least once per job. I put a few debugging puts
statements into Project#predefined_variables
and elsewhere, created a new pipeline based on GitLab's own .gitlab-ci.yml
, and got this output:
SEQUENCE#BUILD! 2020-04-06 17:10:54: START
SEQUENCE#BUILD! 2020-04-06 17:10:54: Step Gitlab::Ci::Pipeline::Chain::Build: START
SEQUENCE#BUILD! 2020-04-06 17:10:54: Step Gitlab::Ci::Pipeline::Chain::Build: END
SEQUENCE#BUILD! 2020-04-06 17:10:54: Step Gitlab::Ci::Pipeline::Chain::Build::Associations: START
SEQUENCE#BUILD! 2020-04-06 17:10:54: Step Gitlab::Ci::Pipeline::Chain::Build::Associations: END
SEQUENCE#BUILD! 2020-04-06 17:10:54: Step Gitlab::Ci::Pipeline::Chain::Validate::Abilities: START
SEQUENCE#BUILD! 2020-04-06 17:10:54: Step Gitlab::Ci::Pipeline::Chain::Validate::Abilities: END
SEQUENCE#BUILD! 2020-04-06 17:10:54: Step Gitlab::Ci::Pipeline::Chain::Validate::Repository: START
SEQUENCE#BUILD! 2020-04-06 17:10:54: Step Gitlab::Ci::Pipeline::Chain::Validate::Repository: END
SEQUENCE#BUILD! 2020-04-06 17:10:54: Step Gitlab::Ci::Pipeline::Chain::Config::Content: START
SEQUENCE#BUILD! 2020-04-06 17:10:54: Step Gitlab::Ci::Pipeline::Chain::Config::Content: END
SEQUENCE#BUILD! 2020-04-06 17:10:54: Step Gitlab::Ci::Pipeline::Chain::Config::Process: START
SEQUENCE#BUILD! 2020-04-06 17:10:55: Step Gitlab::Ci::Pipeline::Chain::Config::Process: END
SEQUENCE#BUILD! 2020-04-06 17:10:55: Step Gitlab::Ci::Pipeline::Chain::RemoveUnwantedChatJobs: START
SEQUENCE#BUILD! 2020-04-06 17:10:55: Step Gitlab::Ci::Pipeline::Chain::RemoveUnwantedChatJobs: END
SEQUENCE#BUILD! 2020-04-06 17:10:55: Step Gitlab::Ci::Pipeline::Chain::Skip: START
SEQUENCE#BUILD! 2020-04-06 17:10:55: Step Gitlab::Ci::Pipeline::Chain::Skip: END
SEQUENCE#BUILD! 2020-04-06 17:10:55: Step Gitlab::Ci::Pipeline::Chain::EvaluateWorkflowRules: START
SEQUENCE#BUILD! 2020-04-06 17:10:55: Step Gitlab::Ci::Pipeline::Chain::EvaluateWorkflowRules: END
SEQUENCE#BUILD! 2020-04-06 17:10:55: Step Gitlab::Ci::Pipeline::Chain::Seed: START
SEED#STAGE_SEEDS: 2020-04-06 17:10:55 Seed::Stage#included? START
SEED#STAGE_SEEDS: 2020-04-06 17:10:55 Seed::Stage#included? END
SEED#STAGE_SEEDS: 2020-04-06 17:10:55 Seed::Stage#included? START
SEED#STAGE_SEEDS: 2020-04-06 17:10:55 Seed::Stage#included? END
SEED#STAGE_SEEDS: 2020-04-06 17:10:55 Seed::Stage#included? START
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
SEED#STAGE_SEEDS: 2020-04-06 17:10:55 Seed::Stage#included? END
SEED#STAGE_SEEDS: 2020-04-06 17:10:55 Seed::Stage#included? START
SEED#STAGE_SEEDS: 2020-04-06 17:10:55 Seed::Stage#included? END
SEED#STAGE_SEEDS: 2020-04-06 17:10:55 Seed::Stage#included? START
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
SEED#STAGE_SEEDS: 2020-04-06 17:10:56 Seed::Stage#included? END
SEED#STAGE_SEEDS: 2020-04-06 17:10:56 Seed::Stage#included? START
SEED#STAGE_SEEDS: 2020-04-06 17:10:56 Seed::Stage#included? END
SEED#STAGE_SEEDS: 2020-04-06 17:10:56 Seed::Stage#included? START
Project#predefined_variables!!!
SEED#STAGE_SEEDS: 2020-04-06 17:10:56 Seed::Stage#included? END
SEED#STAGE_SEEDS: 2020-04-06 17:10:56 Seed::Stage#included? START
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
SEED#STAGE_SEEDS: 2020-04-06 17:10:56 Seed::Stage#included? END
SEED#STAGE_SEEDS: 2020-04-06 17:10:56 Seed::Stage#included? START
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
SEED#STAGE_SEEDS: 2020-04-06 17:10:56 Seed::Stage#included? END
SEED#STAGE_SEEDS: 2020-04-06 17:10:56 Seed::Stage#included? START
Project#predefined_variables!!!
SEED#STAGE_SEEDS: 2020-04-06 17:10:56 Seed::Stage#included? END
SEED#STAGE_SEEDS: 2020-04-06 17:10:56 Seed::Stage#included? START
SEED#STAGE_SEEDS: 2020-04-06 17:10:56 Seed::Stage#included? END
SEED#STAGE_SEEDS: 2020-04-06 17:10:56 Seed::Stage#included? START
SEED#STAGE_SEEDS: 2020-04-06 17:10:56 Seed::Stage#included? END
SEED#STAGE_SEEDS: 2020-04-06 17:10:56 Seed::Stage#included? START
SEED#STAGE_SEEDS: 2020-04-06 17:10:56 Seed::Stage#included? END
SEQUENCE#BUILD! 2020-04-06 17:10:56: Step Gitlab::Ci::Pipeline::Chain::Seed: END
SEQUENCE#BUILD! 2020-04-06 17:10:56: Step Gitlab::Ci::Pipeline::Chain::Limit::Size: START
SEQUENCE#BUILD! 2020-04-06 17:10:56: Step Gitlab::Ci::Pipeline::Chain::Limit::Size: END
SEQUENCE#BUILD! 2020-04-06 17:10:56: Step Gitlab::Ci::Pipeline::Chain::Validate::External: START
SEQUENCE#BUILD! 2020-04-06 17:10:56: Step Gitlab::Ci::Pipeline::Chain::Validate::External: END
SEQUENCE#BUILD! 2020-04-06 17:10:56: Step Gitlab::Ci::Pipeline::Chain::Populate: START
SEQUENCE#BUILD! 2020-04-06 17:10:56: Step Gitlab::Ci::Pipeline::Chain::Populate: END
SEQUENCE#BUILD! 2020-04-06 17:10:56: Step Gitlab::Ci::Pipeline::Chain::Create: START
SEQUENCE#BUILD! 2020-04-06 17:10:59: Step Gitlab::Ci::Pipeline::Chain::Create: END
SEQUENCE#BUILD! 2020-04-06 17:10:59: Step Gitlab::Ci::Pipeline::Chain::Limit::Activity: START
SEQUENCE#BUILD! 2020-04-06 17:10:59: Step Gitlab::Ci::Pipeline::Chain::Limit::Activity: END
SEQUENCE#BUILD! 2020-04-06 17:10:59: Step Gitlab::Ci::Pipeline::Chain::Limit::JobActivity: START
SEQUENCE#BUILD! 2020-04-06 17:10:59: Step Gitlab::Ci::Pipeline::Chain::Limit::JobActivity: END
SEQUENCE#BUILD! 2020-04-06 17:10:59: END
(note that this is with RequestStore
enabled).
The cost of generating these variables is quite, um, variable. Some are backed by database columns in the same model currently holding the code; some go to associated records (which can be expensive with repeated calls, e.g.: !28688 (merged) ); others make calls to Gitaly, the results of which may or may not be cached in redis, RequestStore, or instance variables. We see that creating a pipeline can be very slow for GitLab.com, and I think this is at least part of why - repeatedly generating the CI variables is not cheap.
Improvements
First, I think we should move all the variable definition code out of the models and into a separate builder of some sort. Centralising the code will make setting expectations around its behaviour much easier.
Then, we can separate the variables out by those that will be the same for all jobs, and those that vary between jobs. There are probably other subsets too, like "these variables are the same for all jobs in this environment".
Once we have these separate categories, we can memoize by them, so we generate the variables the minimum number of times required to service every job in the pipeline.
Risks
Refactoring always carries a risk of breaking what we're working on, and the variables code isn't particularly well-tested at the moment - e.g. MergeRequest#predefined_variables
had no tests at all when I came to it. The most likely regressions will be some variables not getting set, or miscategorising some variables so they end up getting the wrong values. I think it's worth it, though.
Involved components
Every model in app/models
that currently defines variables would be affected by this, along with the code in lib/gitlab/ci
that is responsible for gathering and evaluating them, both at the seed and build steps.
Optional: Intended side effects
Improved performance of generating variables