Skip to content
Snippets Groups Projects
Closed Semgrep-based analysis in GitLab SAST
  • Semgrep-based analysis in GitLab SAST

  • Closed Epic created by Thomas Woodham

    Context

    GitLab SAST historically has been powered by over a dozen open-source static analysis security analyzers. These analyzers have proactively identified millions of vulnerabilities for GitLab users, but each of these analyzers is language-specific and uses a different scanning approach.

    We are currently streamlining the set of SAST analyzers to provide:

    • A simpler operational experience, for example, by not requiring compilation or complicated build configuration steps.
    • Faster performance.
    • Better rule customization, since rules can be defined in configuration files instead of code.
    • A more consistent user experience across languages.

    Semgrep-based scanning

    Adding Semgrep-based scanning is a key part of this effort, though we are also working on other efforts in this area.

    The GitLab Static Analysis and Vulnerability Research teams have worked together to transition coverage from a number of existing open-source analyzers to Semgrep-based scanning. We plan to continue to migrate existing scanner coverage to Semgrep-based scanning, as described in this epic.

    Semgrep-based scanning in GitLab SAST includes:

    Functional requirements

    • Semgrep analyzer(s) enabled by existing SAST vendored template.
    • Semgrep analyzer(s) run at same license tier as other SAST analyzers.
    • Match existing support for custom rulesets.
    • Ability to run new semgrep analyzer(s) alongside existing SAST analyzers.
    • Ability to deduplicate multiple SAST analyzers finding the same vulnerability.
      • Example: if bandit and semgrep analyzer find the same CWE finding, only show one finding in the MR widget and create one vulnerability if merged.

    Comparison Criteria

    1. rule type coverage (total number of rules to be checked, comparison of classification capabilities)
    2. field mappings - severity, location, field descriptions
    3. gl feature support - ultimate licensing, custom rulesets, directory/path exclusions, build/compilation requirements, SEARCH_MAX_DEPTH
    4. benchmarking - scan walltime, memory usage, cpu usage
    5. logging
    6. unifying analyzers or keeping separate (analyzer w/ both python and javascript rules?)
    7. offline requirements
    8. OSS licensing

    Language priorities

    These languages are high priority to resolve customer issues with existing analyzers:

    These languages have specific recorded customer interest due to capabilities in Semgrep-based analyzer (like rule customization):

    These languages are targeted for conversion to streamline ongoing maintenance effort:

    We've completed a number of previous conversions:

    Edited by Connor Gilbert

    Child items
    21
    54 100%

    54 100%

  • View on a roadmap
  • Linked items 0

  • Link items together to show that they're related or that one is blocking others.

    Activity

    • All activity
    • Comments only
    • History only
    • Newest first
    • Oldest first