Backend: Refactor Rule::Clause::Exists to improve performance
Summary
In !148356 (merged) we introduced new subkeys to rules:exists
: paths
, project
, ref
. This first iteration is not optimized for performance and could have problems in the future if rules:exists:project
becomes widely used.
The code that supports the new subkeys introduces several new database and Gitaly calls, including ones to fetch the project and sha, and to check user permissions. If a pipeline has many nested includes using rules:exists:project
, this could cause a noticeable performance degradation.
Further context:
The following discussion from !148356 (merged) should be addressed:
-
@lma-git
started a discussion:@furkanayhan
: I don't want to block this MR but I am scared that we'll face similar problems that we had inincludes
before with performance. Atincludes
, we improved the performance by caching/memoizing/batch-loading project, permissions, etc. Do you think we should improve this area first before introducing this? I am asking this because we are implementing a logic that we fetch project and commit then check for permission and then create a context for each rule:exists.This reminded me of #351593 (closed) and #450687.
@lmg-git
: We can open a follow up issue to implement something like what you did withfile.preload_context
inMapper::Verifier
. I think we can adopt a similar approach inMapper::Filter
, but it definitely requires more consideration because of how the rules/clauses are loaded. I think it would also take a few iterations because I'd like to do refactoring like in https://gitlab.com/gitlab-org/g itlab/-/issues/454384 first to clean up the code.
Proposal
Refactor Rule::Clause::Exists
and related classes to improve performance. Utilize techniques similar to the ones employed for batch requesting include files: e.g. batch/preloading, memoizing, caching.