SPP - skip git submodules paths from scanning

Everyone can contribute. Help move this issue forward while earning points, leveling up and collecting rewards.

Overview

While reviewing !199515 (merged), we noticed that it could be possible to skip scanning paths if that path is a git submodule.

This issue aims to track the implementation of this tiny optimization.

What are git submodules?

Git submodules are a way to include other repositories in your own, but instead of copying the files and code around, you're linking them via git. A submodule doesn't really include new code from other repositories but only a reference (commit sha) to a specific commit in the other repository, so when updating a submodule to point to a new revision, those files with the revisions are updated. I'm proposing we skip those files from being scanned.

Implementation Plan

In order to skip git submodules from being scanned, we could utilise Gitlab::Git::ChangedPath method submodule_change?.

To do so, update PayloadProcessor to reject changed paths that are git submodules changes:

  paths.reject! { |changed_path| exclusions_manager.matches_excluded_path?(changed_path.path) }
  paths.reject!(&:submodule_change?)

Or better yet:

  paths.reject! do |changed_path|
    # reject paths from user-provided exclusions or git submodule changes
    exclusions_manager.matches_excluded_path?(changed_path.path) || changed_path.submodule_change? 
  end          

Please also refer to the original thread for more details.

Future Iteration

While skipping git submodules changes (which often just point to a new revision in the submodule) seems like a nice (small) win.

In a future iteration, it would likely make sense to use that as a signal to run a secret detection scan on the submodule (if possible).

This could potentitally work as follows:

  1. Detect a change in a git submodule pointer/revision.
  2. Perform a temporary, read-only clone of the submodule's repository at the specified commit.
  3. Scan the contents for secrets.
Edited by 🤖 GitLab Bot 🤖