Skip to content

Refactor project-level protected environments for group-level [RUN ALL RSPEC] [RUN AS-IF-FOSS]

Shinya Maeda requested to merge refactor-project-level-protected-envs into master

NOTE: This is a blocker against the critical issue for important prospects, thus the priority is relatively high

Problem

While we were working on the Group-level protected environments feature, we realized that the current project-level protected environments had performance/architectural issues:

  • There is a N+1 problem when authorizing a user to access to an protected environment. This is a long standing issue. e.g. https://gitlab.com/gitlab-org/gitlab/-/issues/321788
  • We're lacking of cache mechanizm to reduce the number of authorization processes. Currently, we execute database queries in every single authorization.
  • We don't have SSOT facility to maintain the authorization logic thus it's hard to be extended for group-level feature. Developers have to take care of the performance optimization in each feature domain, which ends up re-invention of the wheels.
  • This issue is also affected by Direct relationship between Pipeline Job and Environment.

The N+1 problems are currently mitigated by a temporary caching, however, since it's tightly related to project context, we can't easily extend it to group-level.

We tried to find a way to build the group-level feature without addressing these issues, however, it'd be quite challenging to pass the performance reviews with the current shape. Especially, group-level feature requires more complex-and-expensive database query execution, meaning there is a risk to be an SLA issue and required to rework in the future.

What does this MR do?

With keeping in mind that Group-level protected environments feature implementation comes at next iteration, we should improve the following things:

  • Introduce SSOT facility for the authorization to make the auth logic container agnostic.
  • Move the project-specific caching to the SSOT facility.
  • Introduce a new caching logic as described below.

Given that there are N jobs, the following process happens per authorization.

  • Can the user executes the job?
  • Will the job interact with environment?
  • Does the user have an access to the environment?
  • Fetch protected environments row. (Cached with RequestStore)
  • Fetch deploy access levels. (Cached with RequestStore)
  • Evaluate (sometimes executes a query to namespaces or check_access)

The point is that we should cache "Does the user have an access to the environment?" result in RequestStore. This works for both project and group levels.

Related #326329 (closed)

Manual QA

Reporter with an explicit access to the protected environment can execute the deployment job.

2021-05-11_19-34

Developer without an access to the protected environment can not execute the deployment job.

2021-05-11_19-37

Does this MR meet the acceptance criteria?

Conformity

Availability and Testing

Security

If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:

  • Label as security and @ mention @gitlab-com/gl-security/appsec
  • The MR includes necessary changes to maintain consistency between UI, API, email, or other methods
  • Security reports checked/validated by a reviewer from the AppSec team
Edited by Shinya Maeda

Merge request reports