Customizable detection logic for Advanced SAST
## Background ### What you can and can't customize today Advanced SAST, and all other SAST analyzers, currently allow you to disable rules or edit their metadata (name, severity, or description). (See [documented Advanced SAST example](https://docs.gitlab.com/ee/user/application_security/sast/customize_rulesets.html#disable-predefined-advanced-sast-rules).) However, you can't customize the actual detection logic today. So, for example, while you could disable the rule that Advanced SAST uses to report SQL injection, you can't change how Advanced SAST finds that SQL injection in the first place. ### Goal The goal of this capability is to allow users to customize detection logic to meet their needs. ### Comparison with "passthrough configuration" feature <details><summary>Click to expand</summary> Today, for the Semgrep-based analyzer only, we offer the ability to inject fully-custom rules. This is called a ["passthrough" configuration](https://docs.gitlab.com/ee/user/application_security/sast/customize_rulesets.html#build-a-custom-configuration) because it _passes through_ to the underlying scanner. When introducing Advanced SAST, we intentionally did not allow users to provide custom rules in this way. This is for two main reasons: 1. **User goals versus complexity.** There is a serious mismatch between: - _The user goal:_ - Many who ask for customization are not truly interested in taking on that level of complexity; rather, they are trying to solve an underlying problem like false-negatives (perceived or actual) and find this to be a possible solution. - Only a smaller set of customers are trying to detect truly-organization-specific security invariants. - _The way the feature is used:_ Many existing users of passthrough configurations have used them to import large sets of community rules into Semgrep CE as a full replacement for the GitLab-managed ruleset. - _The work required to use the feature:_ It is actually rather difficult to write high-quality detection rules (particularly ones that detect the desired behavior robustly with an acceptable false-positive rate). 2. **Usability.** User-provided rules that mimicked full Semgrep rules would be difficult for customers to maintain, and difficult for us to support. - User-provided rules would not benefit from the types of templating and reuse that we use to achieve consistent results across different rules without rewriting every combination of source and sink. For example, our rules use “taint templates”—common lists of sources or sinks—that are shared across multiple rules. Customer-provided rules would not be able to use these templates, so customers would have to build the entire list of sources or sinks themselves. - When we make changes to the engine, we can reliably test our rules. We can't necessarily predict what effects engine changes will have on rules we've never seen. </details> ## Approach The approach here is to reshape the configuration "interface" to more specifically capture user intent. Instead of asking users to learn how to develop rules the same way we do, we would expose specific types of customizations. By changing the interface from something akin to "you can learn to write your own SAST detection logic" to "tell us these specific bits about your org, and we'll plug them in where needed" we: * simplify UX. * allow ourselves greater opportunities for optimization & more tailored/accurate detection. * align better with the type of customizations offered in competing products. (See internal issue https://gitlab.com/gitlab-org/gitlab/-/issues/497659+.) ## Proposal As an alternative to passthrough configs, accept specific config options. Available options could include: * To influence taint analysis: * Custom user input sources (with appropriate types-HTTP, etc.) * Custom sinks (again with appropriate types) * Custom sanitizers * For other use cases: * Lists of banned functions * Custom non-taint rules ### Configuration experience So, how does configuration work? Regardless of how we allow users to make their configurations, the basic requirement is that the Advanced SAST engine can accept and use these customizations. User workflow options could include: - Using the existing TOML file. - Users are already familiar with this configuration mechanism, even though it's not the ideal UX. - The TOML also can be shared across multiple projects by using a [remote/shared configuration file](https://docs.gitlab.com/ee/user/application_security/sast/customize_rulesets.html#specify-a-remote-configuration-file). - Make these options configurable in the UI. - We believe this would provide a better overall user experience, but at the expense of tying two projects' fates and timelines together. The expectation is that we would decouple the projects: - Initial release: offer configurability in TOML - Future release: adopt UI-based configuration ### Findings display * Any finding influenced by a customization should be clearly marked. * If a fully-custom rule, this annotation would be at the finding level. * If the customization was done at a source/sink level, then the annotation would be on the relevant entry in the Code Flow view. <!-- triage-serverless v3 PLEASE DO NOT REMOVE THIS SECTION --> > [!important] > This page may contain information related to upcoming products, features and functionality. > It is important to note that the information presented is for informational purposes only, so please do not rely on the information for purchasing or planning purposes. > Just like with all projects, the items mentioned on the page are subject to change or delay, and the development, release, and timing of any products, features, or functionality remain at the sole discretion of GitLab Inc. <!-- triage-serverless v3 PLEASE DO NOT REMOVE THIS SECTION --> ---- ## Rescoped Customizable detection logic for Advanced SAST The customizable detection logic feature for Advanced SAST has been significantly simplified to focus on allowing users to **add supplemental non-taint rules only**. The original ambitious scope—which included custom sources, sinks, sanitizers, and banned functions—has been reduced to enable users to augment the default Advanced SAST ruleset with their own structural (non-taint) detection rules. This approach maintains the integrity and performance of GitLab's managed ruleset while giving users flexibility to detect organization-specific security patterns. The implementation uses a new `keepdefaultrules` flag that allows users to retain the default ruleset while adding their custom rules. Unlike the previous Semgrep passthrough approach, this design intentionally limits complexity by supporting only `file` based passthroughs.
epic