Use `DiffBlobs()` RPC to scan a set of changes
Overview
This issue aim to limit secret push protection scans to a set of changes (delta) instead of entire files. This could likely be achieved by utilising the recently-introduced DiffBlob()
RPC from Gitaly, but also requires a number of changes in how we retrieve changes to a repository as described below.
Implementation Plan
-
Filter out blobs below or equal to size limit (i.e. 1MiB) using.ListAllBlobs()
/ListBlobs()
- Use
ListAllCommits()
to get new commits for the push we are scanning. - Get changed paths using
FindChangedPaths()
RPC for those commits. -
Associate blobs returned from.ListAllBlobs()
/ListBlobs()
with changed paths - Use the paths to calculate blob pairs for each blob.
- Pass those blob pairs to
DiffBlobs()
to get delta/diffs. - Pass the diffs to Ruby Gem or Secret Detection Service for scanning.
Note: The above (especially number 2) aligns very much with the proposal made by groupsource code a while ago to replace the calls made to GetTreeEntries()
RPC to retrieve blob metadata (file path and commit sha) when a secret is detected, so it's important to explore this idea while working on this issue as it will likely be a side-effect to this one.
Before jumping right into this, please also consider looking at gitaly#5682 to see if this is a better alternative to the proposal above.