checks: Speed up retrieving commits via quarantine directory
What does this MR do?
The Git access checks need access to objects which have been newly introduced for a set of given changes. These new objects used to be retrieved via a revision walk, where preexisting references have been excluded and new references included. The resulting set of objects is then the set of objects which haven't previously been referenced by anything else. Depending on the number of references, walking all preexisting ones can be quite expensive and easily take dozens of seconds.
In the past, we have thus converted our LFS pointer checks to use the quarantine directory to list new LFS pointers. The quarantine directory hosts all objects which have been uploaded by a git-push(1), and can thus directly tell us about which objects are new and which aren't. This has sped up LFS pointer checks in case we have a quarantine directory by 3 orders of magnitude in biggish repos with many refs, going from up to 20 seconds down to 20 milliseconds.
Given that commits can be front-loaded in a single batched RPC call
since 156ce433 (checks: Implement infrastructure for bulk diff
checks, 2021-07-29), we are now in a position to do the same for
commits: If we have an object quarantine directory, then we use the new
ListAllCommits()
RPC with all alternate paths unset. This will thus
effectively only list commits of the object directory, which in a
quarantined environment points to the quarantine object directory.
Note that we still have to retain the old code to list commits via a revision walk, given that non-push RPCs currently don't have an object quarantine directory available. While this is slated to change soon by creating manual quarantines, we must continue to support enumeration via revwalks.
Further note that I didn't create a new feature flag for this change. We already have the :changes_batch_commits
feature flag, which is still disabled (#336992 (closed)).
Conformity
-
I have included changelog trailers, or none are needed. (Does this MR need a changelog?) -
I have added/updated documentation, or it's not needed. (Is documentation required?) -
I have properly separated EE content from FOSS, or this MR is FOSS only. (Where should EE code go?) -
I have added information for database reviewers in the MR description, or it's not needed. (Does this MR have database related changes?) -
I have self-reviewed this MR per code review guidelines. -
This MR does not harm performance, or I have asked a reviewer to help assess the performance impact. (Merge request performance guidelines) -
I have followed the style guides. -
This change is backwards compatible across updates, or this does not apply.
Availability and Testing
-
I have added/updated tests following the Testing Guide, or it's not needed. (Consider all test levels. See the Test Planning Process.) -
I have tested this MR in all supported browsers, or it's not needed. -
I have informed the Infrastructure department of a default or new setting change per definition of done, or it's not needed.
Security
Does this MR contain changes to processing or storing of credentials or tokens, authorization and authentication methods or other items described in the security review guidelines? If not, then delete this Security section.
-
Label as security and @ mention @gitlab-com/gl-security/appsec
-
The MR includes necessary changes to maintain consistency between UI, API, email, or other methods -
Security reports checked/validated by a reviewer from the AppSec team