[Backend] Follow-up to handle path exclusions in the gem or the secret detection service
Overview
While working on Phase 1 to add ability to manage and use SD Exclusions when scanning for secrets using secret push protection, we initially had to handle path-based exclusions in a post-scanning step (read more on this here), this was due to a couple of limitations we had when using ListBlobs()
/ListAllBlobs()
RPCs to retrieve committed changes for a new git push operation:
- Having to scan entire files as Gitaly didn't provide RPCs to get only diffs.
- Not having associated file paths available until we call
GetTreeEntries()
RPC later in the process (post-scanning).
In #469161 (closed), we worked to re-implement our scanning mechanism to drop ListBlobs()
/ListAllBlobs()
+ GetTreeEntries()
in favour of a combination of DiffBlobsRequest()
+ ListAllCommits()
/FindChangedPath()
, which allowed us to retrieve associated file paths of committed changes before we start scanning for secrets (pre-scanning). The aim of this issue is to pass retrieved file paths to the scanning interface (whether the gem or the secret detection service) and have it handle the exclusions entirely by itself to improve separation of concerns.
Proposal
-
Update the push check to pass file paths retrieved from
FindChangedPath()
to scanning interfaces (along with exclusions). - Update the gem to handle path exclusions by itself.
- Update the secret detection service to handle path exclusions by itself.
Original Discussion
The following discussion from !166511 (merged) should be addressed:
- @vshushlin started a discussion:
I left a few comments, but my biggest concern is the separation of concerns of the
lib
code and thegem
.Right now:
- some of the scanning exclustions are implemented in the gem, and some in the lib
- so we pass
exclustion
to the gem, as well as have code handling on time directly- we also post-process secrets finding, and alter the result that gem returns
That makes it super hard to understand, write tests for etc. Everything is coupled way too tightly.
I understand that you've done that because you don't have some information (file paths) on the gem level, but I think we should either:
- pass this information to the gem, and handle all exclusions there
- post-process all exclusions on the library level
I would vote for the second option, as it's much more straightforward to implement right now, and you can open a follow-up issue to move this logic to the gem completely later.