Consider using the `list_commits` and `find_changed_paths` methods to retrieve blob metadata
Context
At the time groupsource code were working on delivering GitGuardian-based pre-receive secret detection, they refactored the way blobs are retrieved from Gitaly to make the use of ListAllBlobs()
and ListBlobs()
RPCs possible in other similar access checks.
They had also updated the new ChangedBlobs
service they had introduced as follows:
- Updated the call to
ListBlobs()
to passwith_paths
flag. - Updated the call to
ListAllBlobs()
to use an extra RPC (i.e.FindChangedPaths
) to get changed paths.
In order to be able to retrieve file paths of blobs a quarantine directory exist or not.
For more information on this, please see these two merge requests: 1, 2.
Overview
A proposal was made by @igor.drozdov in this POC merge request to replace:
-
GetTreeEntries()
RPC – which is currently used to retrieve blob metadata inSecretsCheck
class.
With both:
-
FindChangedPaths()
RPC – which is used to load up the file path of a blob inChangedBlobs
service. -
ListAllCommits()
RPC – which is used to load commit objects.
To load a blob metadata (i.e. file path and commit sha) without having to walk the git
tree of all changed commits.
However, such change would likely involve an extra call to Gitaly as can be seen below.
Current Implementation
-
ListAllBlobs()
orListBlobs()
– 1 per scan -
GetTreeEntries()
– 1 per commit
Igor's POC
-
ListAllBlobs()
orListBlobs()
– 1 per scan -
ListAllCommits()
– 1 per scan -
FindChangedPaths()
– 1 per commit
Assistance Needed
This issue might require some assistance from either groupgitaly or groupsource code.
Proposal
We should spend a bit of time to explore the proposal from the aforementioned merge request, with the aim being:
- Confirm the potential performance gain by profiling the change using our automated profiling tests.
- Confirm the technical approach matches our plans and works for our use case.
And if we are happy with the results, we should consider updating the implementation as proposed.