checks: Implement infrastructure for bulk diff checks
What does this MR do?
The diff checks are quite expensive to compute given that we have to
enumerate all new commits for each updated or new branch, which
translates to one call to git rev-list $newrev --not --all
.
Performance linearly grows with the number of existing branches in the
target repository: in the context of repos with hundreds of thousands of
branches, it can easily take multiple seconds up to a few dozens of
seconds. It doesn't help that we do this call once per reference.
This MR creates the infrastructure to do diff checks in bulk, so we can preleoad all new commits once and then pass these commits to the BulkCheck class. This avoids having to call the expensive ListCommits RPC multiple times.
With the bulk load, we do have one problem though: the BulkCheck class expects to get only those new commits which are in fact relevant to the change at hand. Luckily, we can rather extract all relevant commits from the set of new commits by walking through all new commits starting with the change's newrev. When we hit a commit ID which isn't part of our new commits, then we know that the commit must've been a preexisting commit in the repository.
Note that this allows the user to update a branch to a preexisting
commit in the case where the diff between old and new commit would've
included changes prohibited by DiffCheck. This is a preexisting twist
though given that we'd previously have retrieved all new commits via
doing git rev-list $newrev --not --all
, which wouldn't return
reachable old commits. So this is not a change in behaviour.
Part of #330327 (closed)
Does this MR meet the acceptance criteria?
Conformity
-
I have included changelog trailers, or none are needed. (Does this MR need a changelog?) -
I have added/updated documentation, or it's not needed. (Is documentation required?) -
I have properly separated EE content from FOSS, or this MR is FOSS only. (Where should EE code go?) -
I have added information for database reviewers in the MR description, or it's not needed. (Does this MR have database related changes?) -
I have self-reviewed this MR per code review guidelines. -
This MR does not harm performance, or I have asked a reviewer to help assess the performance impact. (Merge request performance guidelines) -
I have followed the style guides. -
This change is backwards compatible across updates, or this does not apply.
Availability and Testing
-
I have added/updated tests following the Testing Guide, or it's not needed. (Consider all test levels. See the Test Planning Process.) -
I have tested this MR in all supported browsers, or it's not needed. -
I have informed the Infrastructure department of a default or new setting change per definition of done, or it's not needed.
Security
Does this MR contain changes to processing or storing of credentials or tokens, authorization and authentication methods or other items described in the security review guidelines? If not, then delete this Security section.
-
Label as security and @ mention @gitlab-com/gl-security/appsec
-
The MR includes necessary changes to maintain consistency between UI, API, email, or other methods -
Security reports checked/validated by a reviewer from the AppSec team