feat: add SAST tool semgrep
Close #132 (closed)
This implementation is inspire with our Node.js, Go, and .NET patterns, with a few key adjustments. Due to the mandatory login requirement for semgrep ci introduced in late 2025 producing this error:
run `semgrep login` before using `semgrep ci` or use `semgrep scan` and set `--config`
I have implemented a local scanning strategy that effectively removes the requirement for a mandatory login.
Scan Logic optimisation (draws inspiration from .NET):
-
Reference Branches: Full scans are executed for
$RELEASE_REF,$INTEG_REF, and$PROD_REF. -
Feature Branches: Differential scans are performed against
$CI_DEFAULT_BRANCH, with a fallback to full scans if reference not exist.
Technical Configuration:
- Reporting: Native and GitLab-SAST formats are active; SARIF is currently disabled but semgrep have native support for this.
-
Environment: The job runs on a Semgrep Docker image while inheriting from
.python-base. Since Semgrep is natively Python-based, this ensures a compatibility, even if the hybrid image setup looks unconventional. - Optimization: I’ve implemented a Python-based rule downloader using ETag caching. A natural synergy, given Python is a Semgrep dependency.
Next Steps:
- I am testing a standalone Semgrep installation on a pure Python image, though I still need to integrate a
maybe_install_packageslogic for Git.
Feedback on this WIP is appreciated
Edited by Bertrand Goareguer