Semgrep-based analysis in GitLab SAST
## Context
GitLab SAST historically has been powered by [over a dozen open-source static analysis security analyzers](https://docs.gitlab.com/ee/user/application_security/sast/#supported-languages-and-frameworks).
These analyzers have proactively identified millions of vulnerabilities for GitLab users, but each of these analyzers is language-specific and uses a different scanning approach.
We are currently streamlining the set of [SAST analyzers](https://docs.gitlab.com/ee/user/application_security/sast/#supported-languages-and-frameworks) to provide:
- A simpler operational experience, for example, by not requiring compilation or complicated build configuration steps.
- Faster performance.
- Better rule customization, since rules can be defined in configuration files instead of code.
- A more consistent user experience across languages.
## Semgrep-based scanning
Adding Semgrep-based scanning is a key part of this effort, though we are also working on [other efforts in this area](https://about.gitlab.com/direction/secure/static-analysis/sast/#next-generation-scanning).
The GitLab Static Analysis and Vulnerability Research teams have worked together to transition coverage from a number of existing open-source analyzers to Semgrep-based scanning.
We plan to continue to migrate existing scanner coverage to Semgrep-based scanning, as described in this epic.
Semgrep-based scanning in GitLab SAST includes:
- The [Semgrep](https://semgrep.dev/) scanning engine, maintained by [r2c](https://r2c.dev). GitLab and r2c have partnered on areas of mutual interest.
- Detection rules that are created, maintained, and supported by GitLab.
- GitLab Ultimate features like [Advanced Vulnerability Tracking](https://docs.gitlab.com/ee/user/application_security/sast/#advanced-vulnerability-tracking).
- Integration with GitLab [Vulnerability Management](https://docs.gitlab.com/ee/user/application_security/vulnerabilities/index.html).
## Functional requirements
* Semgrep analyzer(s) enabled by existing SAST vendored template.
* Semgrep analyzer(s) run at same license tier as other SAST analyzers.
* Match existing support for custom rulesets.
* Ability to run new semgrep analyzer(s) alongside existing SAST analyzers.
* Ability to deduplicate multiple SAST analyzers finding the same vulnerability.
* Example: if `bandit` and `semgrep` analyzer find the same CWE finding, only show one finding in the MR widget and create one vulnerability if merged.
#### Comparison Criteria
1. rule type coverage (total number of rules to be checked, comparison of classification capabilities)
1. field mappings - severity, location, field descriptions
1. gl feature support - ultimate licensing, custom rulesets, directory/path exclusions, build/compilation requirements, `SEARCH_MAX_DEPTH`
1. benchmarking - scan walltime, memory usage, cpu usage
1. logging
1. unifying analyzers or keeping separate (analyzer w/ both python and javascript rules?)
1. offline requirements
1. OSS licensing
## Language priorities
These languages are high priority to resolve customer issues with existing analyzers:
- Scala (https://gitlab.com/gitlab-org/gitlab/-/issues/362958)
- NodeJS (https://gitlab.com/gitlab-org/gitlab/-/issues/395487), which also has the benefit of aligning NodeJS with general JavaScript scanning
These languages have specific recorded customer interest due to capabilities in Semgrep-based analyzer (like rule customization):
- https://gitlab.com/gitlab-org/gitlab/-/issues/364060+
These languages are targeted for conversion to streamline ongoing maintenance effort:
- https://gitlab.com/gitlab-org/gitlab/-/issues/329712+
- Kotlin, for simplicity and because of issues with SpotBugs (e.g. https://gitlab.com/gitlab-org/gitlab/-/issues/350801). Note that Kotlin is a [Beta-maturity language](https://semgrep.dev/docs/supported-languages/) in Semgrep as of 2023-04-04.
We've completed a number of previous conversions:
- https://gitlab.com/groups/gitlab-org/-/epics/5440+
- https://gitlab.com/groups/gitlab-org/-/epics/5688+
- https://gitlab.com/gitlab-org/gitlab/-/issues/352666+
- Go
- C
- C# (https://gitlab.com/gitlab-org/gitlab/-/issues/347258)
epic