Extract variables builder from project/pipeline/build classes
Description
We want to optimize the workflow of calculating build variables, as per #213560 (closed) and #22520 (closed).
In order to make it easier to optimize calculating build variables, we might want to extract variables builder, to a separate class / module. This will make it easier to profile this code, and find deficiencies.
#213560 (closed)
Problem 1 - fromWhen calling Ci::CreatePipelineService
, N jobs are created, as specified in the .gitlab-ci.yml
file. Each of these jobs has a number of environment variables set, and whether the job should be created or not can sometimes depend on the value of these variables. Many of the variables are identical for every job in the pipeline; a small number vary between jobs in the same pipeline.
I'm not too familiar with the CI areas of the codebase, but to me it looks like we build the list of variables at least once per job. I put a few debugging puts
statements into Project#predefined_variables
and elsewhere, created a new pipeline based on GitLab's own .gitlab-ci.yml
, and got this output:
Additional Context
SEQUENCE#BUILD! 2020-04-06 17:10:54: START
SEQUENCE#BUILD! 2020-04-06 17:10:54: Step Gitlab::Ci::Pipeline::Chain::Build: START
SEQUENCE#BUILD! 2020-04-06 17:10:54: Step Gitlab::Ci::Pipeline::Chain::Build: END
SEQUENCE#BUILD! 2020-04-06 17:10:54: Step Gitlab::Ci::Pipeline::Chain::Build::Associations: START
SEQUENCE#BUILD! 2020-04-06 17:10:54: Step Gitlab::Ci::Pipeline::Chain::Build::Associations: END
SEQUENCE#BUILD! 2020-04-06 17:10:54: Step Gitlab::Ci::Pipeline::Chain::Validate::Abilities: START
SEQUENCE#BUILD! 2020-04-06 17:10:54: Step Gitlab::Ci::Pipeline::Chain::Validate::Abilities: END
SEQUENCE#BUILD! 2020-04-06 17:10:54: Step Gitlab::Ci::Pipeline::Chain::Validate::Repository: START
SEQUENCE#BUILD! 2020-04-06 17:10:54: Step Gitlab::Ci::Pipeline::Chain::Validate::Repository: END
SEQUENCE#BUILD! 2020-04-06 17:10:54: Step Gitlab::Ci::Pipeline::Chain::Config::Content: START
SEQUENCE#BUILD! 2020-04-06 17:10:54: Step Gitlab::Ci::Pipeline::Chain::Config::Content: END
SEQUENCE#BUILD! 2020-04-06 17:10:54: Step Gitlab::Ci::Pipeline::Chain::Config::Process: START
SEQUENCE#BUILD! 2020-04-06 17:10:55: Step Gitlab::Ci::Pipeline::Chain::Config::Process: END
SEQUENCE#BUILD! 2020-04-06 17:10:55: Step Gitlab::Ci::Pipeline::Chain::RemoveUnwantedChatJobs: START
SEQUENCE#BUILD! 2020-04-06 17:10:55: Step Gitlab::Ci::Pipeline::Chain::RemoveUnwantedChatJobs: END
SEQUENCE#BUILD! 2020-04-06 17:10:55: Step Gitlab::Ci::Pipeline::Chain::Skip: START
SEQUENCE#BUILD! 2020-04-06 17:10:55: Step Gitlab::Ci::Pipeline::Chain::Skip: END
SEQUENCE#BUILD! 2020-04-06 17:10:55: Step Gitlab::Ci::Pipeline::Chain::EvaluateWorkflowRules: START
SEQUENCE#BUILD! 2020-04-06 17:10:55: Step Gitlab::Ci::Pipeline::Chain::EvaluateWorkflowRules: END
SEQUENCE#BUILD! 2020-04-06 17:10:55: Step Gitlab::Ci::Pipeline::Chain::Seed: START
SEED#STAGE_SEEDS: 2020-04-06 17:10:55 Seed::Stage#included? START
SEED#STAGE_SEEDS: 2020-04-06 17:10:55 Seed::Stage#included? END
SEED#STAGE_SEEDS: 2020-04-06 17:10:55 Seed::Stage#included? START
SEED#STAGE_SEEDS: 2020-04-06 17:10:55 Seed::Stage#included? END
SEED#STAGE_SEEDS: 2020-04-06 17:10:55 Seed::Stage#included? START
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
SEED#STAGE_SEEDS: 2020-04-06 17:10:55 Seed::Stage#included? END
SEED#STAGE_SEEDS: 2020-04-06 17:10:55 Seed::Stage#included? START
SEED#STAGE_SEEDS: 2020-04-06 17:10:55 Seed::Stage#included? END
SEED#STAGE_SEEDS: 2020-04-06 17:10:55 Seed::Stage#included? START
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
SEED#STAGE_SEEDS: 2020-04-06 17:10:56 Seed::Stage#included? END
SEED#STAGE_SEEDS: 2020-04-06 17:10:56 Seed::Stage#included? START
SEED#STAGE_SEEDS: 2020-04-06 17:10:56 Seed::Stage#included? END
SEED#STAGE_SEEDS: 2020-04-06 17:10:56 Seed::Stage#included? START
Project#predefined_variables!!!
SEED#STAGE_SEEDS: 2020-04-06 17:10:56 Seed::Stage#included? END
SEED#STAGE_SEEDS: 2020-04-06 17:10:56 Seed::Stage#included? START
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
SEED#STAGE_SEEDS: 2020-04-06 17:10:56 Seed::Stage#included? END
SEED#STAGE_SEEDS: 2020-04-06 17:10:56 Seed::Stage#included? START
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
Project#predefined_variables!!!
SEED#STAGE_SEEDS: 2020-04-06 17:10:56 Seed::Stage#included? END
SEED#STAGE_SEEDS: 2020-04-06 17:10:56 Seed::Stage#included? START
Project#predefined_variables!!!
SEED#STAGE_SEEDS: 2020-04-06 17:10:56 Seed::Stage#included? END
SEED#STAGE_SEEDS: 2020-04-06 17:10:56 Seed::Stage#included? START
SEED#STAGE_SEEDS: 2020-04-06 17:10:56 Seed::Stage#included? END
SEED#STAGE_SEEDS: 2020-04-06 17:10:56 Seed::Stage#included? START
SEED#STAGE_SEEDS: 2020-04-06 17:10:56 Seed::Stage#included? END
SEED#STAGE_SEEDS: 2020-04-06 17:10:56 Seed::Stage#included? START
SEED#STAGE_SEEDS: 2020-04-06 17:10:56 Seed::Stage#included? END
SEQUENCE#BUILD! 2020-04-06 17:10:56: Step Gitlab::Ci::Pipeline::Chain::Seed: END
SEQUENCE#BUILD! 2020-04-06 17:10:56: Step Gitlab::Ci::Pipeline::Chain::Limit::Size: START
SEQUENCE#BUILD! 2020-04-06 17:10:56: Step Gitlab::Ci::Pipeline::Chain::Limit::Size: END
SEQUENCE#BUILD! 2020-04-06 17:10:56: Step Gitlab::Ci::Pipeline::Chain::Validate::External: START
SEQUENCE#BUILD! 2020-04-06 17:10:56: Step Gitlab::Ci::Pipeline::Chain::Validate::External: END
SEQUENCE#BUILD! 2020-04-06 17:10:56: Step Gitlab::Ci::Pipeline::Chain::Populate: START
SEQUENCE#BUILD! 2020-04-06 17:10:56: Step Gitlab::Ci::Pipeline::Chain::Populate: END
SEQUENCE#BUILD! 2020-04-06 17:10:56: Step Gitlab::Ci::Pipeline::Chain::Create: START
SEQUENCE#BUILD! 2020-04-06 17:10:59: Step Gitlab::Ci::Pipeline::Chain::Create: END
SEQUENCE#BUILD! 2020-04-06 17:10:59: Step Gitlab::Ci::Pipeline::Chain::Limit::Activity: START
SEQUENCE#BUILD! 2020-04-06 17:10:59: Step Gitlab::Ci::Pipeline::Chain::Limit::Activity: END
SEQUENCE#BUILD! 2020-04-06 17:10:59: Step Gitlab::Ci::Pipeline::Chain::Limit::JobActivity: START
SEQUENCE#BUILD! 2020-04-06 17:10:59: Step Gitlab::Ci::Pipeline::Chain::Limit::JobActivity: END
SEQUENCE#BUILD! 2020-04-06 17:10:59: END
(note that this is with RequestStore
enabled).
The cost of generating these variables is quite, um, variable. Some are backed by database columns in the same model currently holding the code; some go to associated records (which can be expensive with repeated calls, e.g.: !28688 (merged) ); others make calls to Gitaly, the results of which may or may not be cached in redis, RequestStore, or instance variables. We see that creating a pipeline can be very slow for GitLab.com, and I think this is at least part of why - repeatedly generating the CI variables is not cheap.
#22520 (closed)
Problem 2 - fromCi::Build#scoped_variables
is responsible for returning all predefined variables that are going to be set in a runner environment.
It is quite slow, it takes around 2s on my local machine, a little more on production currently.
We are calling Gitaly three times there, and there is probably some room for optimization.
Proposal (done in 1st iteration - see next steps)
Extract Ci::Variables::Builder
class or module. Use dependency injection to manage dependencies on objects that are required to calculate variables like project / pipeline / build etc.
Ensure that we can remove duplicate computations with memoization. For example, we should be able to use the same builder for multiple builds in a pipeline to make the most of memoization.
Expected improvements:
- we should have the logic to compute CI variables in one place
- easy to memoize variables (e.g. group, project, pipeline variables) across multiple builds
- easy to profile and make further improvements when needed
Merge Request in progress: !52800 (closed)
Next steps
-
Enable the FF introduced in !72348 (merged) & measure results -
Add more variables into the builder, with feature flags, repeating what we did in !72348 (merged). -
Plan is pretty much what is described here !71439 (comment 702509482) -
Over time we should see a decline in the time taken to build these scoped_variables. As measured by the gitlab_ci_pipeline_builder_scoped_variables_duration
prometheus metric. -
Move more variables into the builder, such that Build#variables
can be fully moved into a builder object.
Testing
Please make sure package-and-qa
pass in implementation MR.