SPP - skip git submodules paths from scanning
Everyone can contribute. Help move this issue forward while earning points, leveling up and collecting rewards.
Overview
While reviewing !199515 (merged), we noticed that it could be possible to skip scanning paths if that path is a git submodule.
This issue aims to track the implementation of this tiny optimization.
What are git submodules?
Git submodules are a way to include other repositories in your own, but instead of copying the files and code around, you're linking them via git. A submodule doesn't really include new code from other repositories but only a reference (commit sha) to a specific commit in the other repository, so when updating a submodule to point to a new revision, those files with the revisions are updated. I'm proposing we skip those files from being scanned.
Implementation Plan
In order to skip git submodules from being scanned, we could utilise Gitlab::Git::ChangedPath method submodule_change?.
To do so, update PayloadProcessor to reject changed paths that are git submodules changes:
paths.reject! { |changed_path| exclusions_manager.matches_excluded_path?(changed_path.path) }
paths.reject!(&:submodule_change?)
Or better yet:
paths.reject! do |changed_path|
# reject paths from user-provided exclusions or git submodule changes
exclusions_manager.matches_excluded_path?(changed_path.path) || changed_path.submodule_change?
end
Please also refer to the original thread for more details.
Future Iteration
While skipping git submodules changes (which often just point to a new revision in the submodule) seems like a nice (small) win.
In a future iteration, it would likely make sense to use that as a signal to run a secret detection scan on the submodule (if possible).
This could potentitally work as follows:
- Detect a change in a git submodule pointer/revision.
- Perform a temporary, read-only clone of the submodule's repository at the specified commit.
- Scan the contents for secrets.