Spike: Investigate generating full SAST reports from diff scans to support incremental scanning
Problem
Currently, Gitlab Advanced SAST(GLAS) diff-based scanning can only produce partial reports on MR pipelines, as it only analyzes changed files. We need to explore whether it's possible to generate full reports while still performing a diff scan by leveraging artifacts from previous scans.
Proposal
Reference a SAST report from a previous scan and use the code flow file locations of each vulnerability to determine what needs to be rescanned:
For vulnerabilities with code flows:
-
If code flow files match removed/changed files → Include all files in the code flow for rescanning
- This ensures fixed vulnerabilities don't reappear since affected files will be rescanned
- If code flow files don't overlap with removed/changed files → Keep the vulnerability in the final report (likely still valid)
- If duplicate vulnerabilities are reported → Rely on existing deduplication logic
For vulnerabilities without code flows:
- If vulnerable file was changed → Rescan it
- If vulnerable file was removed → Drop the vulnerability
- If vulnerable file is unchanged → Keep it in the report
Handling rule changes:
- New rule added → Scan all files with that rule
- Rule removed → Drop vulnerabilities reported by it
- Rule modified → Remove findings from that rule and rescan those files
Goals of This Spike
- Validate technical feasibility of the proposed approach
- Identify implementation challenges and edge cases
- Estimate effort required for full implementation
- Assess performance impact compared to full scans with caching
- Determine artifact storage requirements and retrieval mechanisms
Key Questions to Answer
- Can the report generated from this approach match the full report of a full scan
- How do we fetch the previous report and ensure it's the relevant one?
- What happens when the previous scan used different rules or engine versions?
References
Edited by 🤖 GitLab Bot 🤖