Skip to content

checks: Speed up retrieving commits via quarantine directory

Patrick Steinhardt requested to merge pks-checks-quarantined-commits into master

What does this MR do?

The Git access checks need access to objects which have been newly introduced for a set of given changes. These new objects used to be retrieved via a revision walk, where preexisting references have been excluded and new references included. The resulting set of objects is then the set of objects which haven't previously been referenced by anything else. Depending on the number of references, walking all preexisting ones can be quite expensive and easily take dozens of seconds.

In the past, we have thus converted our LFS pointer checks to use the quarantine directory to list new LFS pointers. The quarantine directory hosts all objects which have been uploaded by a git-push(1), and can thus directly tell us about which objects are new and which aren't. This has sped up LFS pointer checks in case we have a quarantine directory by 3 orders of magnitude in biggish repos with many refs, going from up to 20 seconds down to 20 milliseconds.

Given that commits can be front-loaded in a single batched RPC call since 156ce433 (checks: Implement infrastructure for bulk diff checks, 2021-07-29), we are now in a position to do the same for commits: If we have an object quarantine directory, then we use the new ListAllCommits() RPC with all alternate paths unset. This will thus effectively only list commits of the object directory, which in a quarantined environment points to the quarantine object directory.

Note that we still have to retain the old code to list commits via a revision walk, given that non-push RPCs currently don't have an object quarantine directory available. While this is slated to change soon by creating manual quarantines, we must continue to support enumeration via revwalks.

Further note that I didn't create a new feature flag for this change. We already have the :changes_batch_commits feature flag, which is still disabled (#336992 (closed)).

Conformity

Availability and Testing

Security

Does this MR contain changes to processing or storing of credentials or tokens, authorization and authentication methods or other items described in the security review guidelines? If not, then delete this Security section.

  • Label as security and @ mention @gitlab-com/gl-security/appsec
  • The MR includes necessary changes to maintain consistency between UI, API, email, or other methods
  • Security reports checked/validated by a reviewer from the AppSec team

Merge request reports