Allow pre-filtering of source files for security scans
Problem to solve
We recently added file exclusion support for SAST and Dependency Scanning using the SAST_EXCLUDED_PATHS
and DS_EXCLUDED_PATHS
environment variable, respectively. This is useful in providing a generic approach to excluding results but can be difficult to map to individual scanners since not all provide an exclusion option to propagate and many that do are not necessarily compatible with our exclusion strategy (i.e. regex format, blob types, exact matches, etc). To fix this we decided to filter vulnerabilities in the orchestrator after the scan has occurred, which works generically for all scanners but requires a full scan regardless.
We should look at a more generic approach to pre-filtering for performance and removing the need for scanning of files that could inadvertently break scans (i.e. ignoring a directory with invalid syntax).
Intended users
Further details
We should look at adding support to the common library for generating an ignorelist
from EXCLUDED_PATHS
. This can be propagated to those scanners that require exact file matches, such as pmd-apex
, and prevent unnecessary scans of all project files. As stated by @fcatteau
we could build the list using https://gitlab.com/gitlab-org/security-products/analyzers/common/blob/master/pathfilter/match.go#L24 and https://golang.org/pkg/path/filepath/#Walk, and then pass it to whatever CLI can process such list, but passing SAST_EXCLUDED_PATHS to find won't work - the syntax is not compatible
Note that the Semgrep-based analyzer currently does exclude paths before scanning: gitlab-org/security-products/analyzers/semgrep!47 (merged).
thread)
(Updated) Proposal (based on this- Determine which analyzers can have entire source files removed without breaking the scan.
- Create a function that moves files found in
DAST/SAST_EXCLUDED_DIRS
to/tmp
- Create a
removeExcludedPaths
helper in thecommand
library. This function should be called before the analyzer runs. - After analyzer has run, restore the files that were saved to
/tmp
Documentation
This would result in no user-facing change as it would involve internally parsing *_EXCLUDED_PATHS
more effectively.
Testing
We should test against a project in which the analyzer will fail if scanning a directory explicitly listed within EXCLUDED_PATHS
; i.e. a file with a syntax error.
What does success look like, and how can we measure that?
More performant analyzers, analyzers that can correctly skip code that should not be scanned.