Use direct relationship between Pipeline Job and Environment instead of expanded_environment_name
Everyone can contribute. Help move this issue forward while earning points, leveling up and collecting rewards.
Problem
GitLab has many features that fetch related environments to a particular object, and one of the biggest users is CI/CD Pipelines. Specifically, Ci::Build
is tightly related to Environment
model. For example, when you visit a job detail page, it renders an information that the job will deploy to which environment. The other example is, when rendering play button on a manual job, it performs authorization check if the user has access to the target environment (for the ProtectedEnvironment
optimization, please see this issue). Whatever the process is, we fetch the related environment in the following way:
build.persisted_environment
The internal process flow is:
- Fetch a corresponding
ci_build_metadata
row. - Read
ci_build_metadata.expanded_environment_name
attribute. - Fetch a matching
environments
row for the environment name. - Return the AR object.
Given that it lacks of direct relationship between Ci::Build
and Environment
, it has to execute two queries in this simple process. But, the biggest problem on this architecture is that we can't preload the associated environments for multiple builds in batch/single-query. Using a temporarily solution like BatchLoader
(Lazy loader) or Gitlab::SafeRequestStore
(short-time caching) might be able to mitigate the issue a little bit, however, it's a fragile approach that likely requires maintenance effort in the future.
Technically, the related environment can be fetched via Deployment
model, however, not all jobs are meant to deploy. Some of them are stopping an environment, or just preparing artifacts for environments. In such case, deployment modeling relationship is insufficient to cover all related environments.
This is a long standing issue. In the past, this problematic architecture caused performance/scalability issues time to time, and every time we deferred the optimal solution due to lacking of capacity. Here are a few of the recent discussion with groupmemory team. We should fix the architectural problem at first in order to reduce the feature maintenance cost.
Environments::Job
model instead of looking up environments by name.
Proposal: Use direct relationship from As part of #552372 (closed) a direct relationship was added between jobs and environments. These records store environment specific options
values, so that they can be referred to later (for example, deployment_tier
or action
). These are the same fields that were previously stored on ci_builds_metadata
.
Currently calls to job.persisted_environment
still use the expanded_environment_name
stored on the Environments::job
model instead of using the direct association. This redundancy exists for two reasons:
- The efficiency improvements made possible by the direct relationship were not required as part of #552372 (closed), which was primarily focused on removing the dependency on CI metadata.
- It allows us to gradually replace usages of
expanded_environment_name
(and in turnpersisted_environment
) without having to change everything over at once.
The proposal for this issue is to begin the incremental replacement of expanded_environment_name
and persisted_environment
. At a high level, this will involve the following:
- Replacing
job.persisted_environment
withjob.job_environment.environment
- Replacing
job.expanded_environment_name
withjob.job_environment.environment.name
Next
Eventually, it should be possible to remove the expanded_environment_name
columns from both job_environments
and ci_builds_metadata
as they will no longer be required.
There may also be some opportunities for optimisation when fetching groups of deployments or environments for a pipeline by using the job_environments.pipeline
association (previously this would require joining all builds for the pipeline, many of which may not use the environment keyword).