Faster Advanced SAST: Diff-based scanning in MRs
## Approach Scan a target project faster by scanning only changed parts of the codebase. For this type of faster scan: - We will focus on the code modified in the MR (essentially the `git diff`). - We will _not_ rely on sharing state (such as a full-scan baseline) between runs. This is a first type of improvement, leading toward a more technically complex system described in future iterations like https://gitlab.com/groups/gitlab-org/-/epics/15545+. ## Product experience ### Feature name The intent of the feature name is to: - Remind the users of a familiar concept (`git diff`). - _Not_ cue other, less applicable, concepts, like differential or incremental backups. - Avoid negative terminology like "partial". - Note: This does not mean we will hide the semantics of this scan type. We will document it and may also indicate it in the UI. This just means not being immediately negative. - The result of a diff-based scan is, indeed, a "partial report" because it won't contain all the findings for the project since only a subset of the files were scanned. ### Merge request semantics There are some unfortunate semantic differences between pipeline types. Only MR pipelines provide enough information in built-in/default CI/CD variables to allow a scan job to reliably determine its source branch/target branch, and therefore the commits that should be scanned. Historically, AST scans have run in branch pipelines by default. In an upcoming release, we will add a CI/CD variable to enable MR pipelines. The details of this problem and how it manifests are covered in extensive detail in https://gitlab.com/gitlab-org/gitlab/-/issues/410880+. Because of this, we may need to limit the feature only to MR pipelines. This would mean that the feature is only available when users have explicitly opted-in to run their scans in MR pipelines. ### Diff base We definitely need to handle the case where an MR is targeted at the default branch of the repository. This would be the common case. If needed, we can state that this is the only supported case. However, we shouldn't do that if we don't have to; as part of refinement, we should verify how non-default branches are currently handled. ### Reporting (MR widget, pipeline report) This scan type may produce false negatives in the scanned files, and probably won't know about scan results in unrelated files. Because of this: 1. We shouldn't make any assertions about an MR having resolved a preexisting vulnerability. This means we should not list "Resolved" findings. 2. We should only highlight new findings. We should compare the diff-based scan's results against the vuln report so that we do not report those findings as though they were new. 3. We should consider an in-UI notice that a partial scan was used, and link to documentation explaining the cases in which this can cause FNs. 4. We should consider documenting how you can re-run the pipeline or job with a full scan if you really want to verify that you fixed something. ### Configuration Secret Detection is the most similar existing scan type. It attempts to figure out the commits that differ from the MR's destination branch, because it needs to scan each one for secrets. To allow users to accommodate their unique CI/CD pipeline configs, Secret Detection [supports a `SECRET_DETECTION_LOG_OPTIONS` variable](https://docs.gitlab.com/user/application_security/secret_detection/pipeline/#available-cicd-variables) that allows people to compute the `git log` options and provide them as a variable. We could offer a similar variable. This is not a hard requirement, and we should consider whether it is necessary. ## Dependencies on other groups Currently, if a previously reported security finding is not present in the latest security report to be ingested, the vulnerability report will mark the finding as [no longer detected](https://docs.gitlab.com/user/application_security/vulnerabilities/#vulnerability-is-no-longer-detected). We need to start a conversation with Security Insights and Security Platform about what changes would be required in vulnerability ingestion to support this initiative. <!-- triage-serverless v3 PLEASE DO NOT REMOVE THIS SECTION --> > [!important] > This page may contain information related to upcoming products, features and functionality. > It is important to note that the information presented is for informational purposes only, so please do not rely on the information for purchasing or planning purposes. > Just like with all projects, the items mentioned on the page are subject to change or delay, and the development, release, and timing of any products, features, or functionality remain at the sole discretion of GitLab Inc. <!-- triage-serverless v3 PLEASE DO NOT REMOVE THIS SECTION -->
epic