Semgrep-based analysis in GitLab SAST
## Context GitLab SAST historically has been powered by [over a dozen open-source static analysis security analyzers](https://docs.gitlab.com/ee/user/application_security/sast/#supported-languages-and-frameworks). These analyzers have proactively identified millions of vulnerabilities for GitLab users, but each of these analyzers is language-specific and uses a different scanning approach. We are currently streamlining the set of [SAST analyzers](https://docs.gitlab.com/ee/user/application_security/sast/#supported-languages-and-frameworks) to provide: - A simpler operational experience, for example, by not requiring compilation or complicated build configuration steps. - Faster performance. - Better rule customization, since rules can be defined in configuration files instead of code. - A more consistent user experience across languages. ## Semgrep-based scanning Adding Semgrep-based scanning is a key part of this effort, though we are also working on [other efforts in this area](https://about.gitlab.com/direction/secure/static-analysis/sast/#next-generation-scanning). The GitLab Static Analysis and Vulnerability Research teams have worked together to transition coverage from a number of existing open-source analyzers to Semgrep-based scanning. We plan to continue to migrate existing scanner coverage to Semgrep-based scanning, as described in this epic. Semgrep-based scanning in GitLab SAST includes: - The [Semgrep](https://semgrep.dev/) scanning engine, maintained by [r2c](https://r2c.dev). GitLab and r2c have partnered on areas of mutual interest. - Detection rules that are created, maintained, and supported by GitLab. - GitLab Ultimate features like [Advanced Vulnerability Tracking](https://docs.gitlab.com/ee/user/application_security/sast/#advanced-vulnerability-tracking). - Integration with GitLab [Vulnerability Management](https://docs.gitlab.com/ee/user/application_security/vulnerabilities/index.html). ## Functional requirements * Semgrep analyzer(s) enabled by existing SAST vendored template. * Semgrep analyzer(s) run at same license tier as other SAST analyzers. * Match existing support for custom rulesets. * Ability to run new semgrep analyzer(s) alongside existing SAST analyzers. * Ability to deduplicate multiple SAST analyzers finding the same vulnerability. * Example: if `bandit` and `semgrep` analyzer find the same CWE finding, only show one finding in the MR widget and create one vulnerability if merged. #### Comparison Criteria 1. rule type coverage (total number of rules to be checked, comparison of classification capabilities) 1. field mappings - severity, location, field descriptions 1. gl feature support - ultimate licensing, custom rulesets, directory/path exclusions, build/compilation requirements, `SEARCH_MAX_DEPTH` 1. benchmarking - scan walltime, memory usage, cpu usage 1. logging 1. unifying analyzers or keeping separate (analyzer w/ both python and javascript rules?) 1. offline requirements 1. OSS licensing ## Language priorities These languages are high priority to resolve customer issues with existing analyzers: - Scala (https://gitlab.com/gitlab-org/gitlab/-/issues/362958) - NodeJS (https://gitlab.com/gitlab-org/gitlab/-/issues/395487), which also has the benefit of aligning NodeJS with general JavaScript scanning These languages have specific recorded customer interest due to capabilities in Semgrep-based analyzer (like rule customization): - https://gitlab.com/gitlab-org/gitlab/-/issues/364060+ These languages are targeted for conversion to streamline ongoing maintenance effort: - https://gitlab.com/gitlab-org/gitlab/-/issues/329712+ - Kotlin, for simplicity and because of issues with SpotBugs (e.g. https://gitlab.com/gitlab-org/gitlab/-/issues/350801). Note that Kotlin is a [Beta-maturity language](https://semgrep.dev/docs/supported-languages/) in Semgrep as of 2023-04-04. We've completed a number of previous conversions: - https://gitlab.com/groups/gitlab-org/-/epics/5440+ - https://gitlab.com/groups/gitlab-org/-/epics/5688+ - https://gitlab.com/gitlab-org/gitlab/-/issues/352666+ - Go - C - C# (https://gitlab.com/gitlab-org/gitlab/-/issues/347258)
epic