Skip to content

Use `DiffBlobs()` RPC to scan a set of changes

Overview

This issue aim to limit secret push protection scans to a set of changes (delta) instead of entire files. This could likely be achieved by utilising the recently-introduced DiffBlob() RPC from Gitaly, but also requires a number of changes in how we retrieve changes to a repository as described below.

Implementation Plan

  1. Filter out blobs below or equal to size limit (i.e. 1MiB) using ListAllBlobs()/ListBlobs().
  2. Use ListAllCommits() to get new commits for the push we are scanning.
  3. Get changed paths using FindChangedPaths() RPC for those commits.
  4. Associate blobs returned from ListAllBlobs()/ListBlobs() with changed paths.
  5. Use the paths to calculate blob pairs for each blob.
  6. Pass those blob pairs to DiffBlobs() to get delta/diffs.
  7. Pass the diffs to Ruby Gem or Secret Detection Service for scanning.

Note: The above (especially number 2) aligns very much with the proposal made by groupsource code a while ago to replace the calls made to GetTreeEntries() RPC to retrieve blob metadata (file path and commit sha) when a secret is detected, so it's important to explore this idea while working on this issue as it will likely be a side-effect to this one.

Before jumping right into this, please also consider looking at gitaly#5682 (closed) to see if this is a better alternative to the proposal above.

Edited by Ahmed Hemdan