Feature Proposal: Snippet detection for Open Source Dependencies
[[_TOC_]]
### Problem to solve
In software development it is common for software engineers to research the underlying problem they are attempting to solve to implement a feature or fix a bug. Engineers and organizations have initiated efforts to drive efficiency by adopting AI development tools. Notably, GitLab Duo, GitHub Copilot, ChatGPT, et. al. These tools help developers answer questions about their codebase, while also providing code snippets to solve common problems faced during software development. Traditional methods of research, vis-a-vis, Google is another point of origination for code snippets.
The problem with adding code snippets into a codebase is the lack of insight on the provenance of the provided solution by the AI tool/research source (Google). Developers can very easily copy code and add it to their project, seemingly as first-party code. In some cases, this is perfectly acceptable. However, there may be instances where that code snippet is in fact code from an Open Source Dependency. This opens the organization to two types of risk: license and vulnerability.
For License Risk, the user may unknowingly pull in code that is protected by a non-permissive license, which could expose their organization to legal risk. In this instance a GitLab customer may be forced to open-source the software that they were building. This is a particularly large threat for customers who play in the embedded device space (auto, manufacturing, med-devices).
For Vulnerability Risk, a user may copy code from a open source dependency that has vulnerabilities associated with it. Because these dependencies are not declared in a manifest file a Dependency Scan will not flag these in scan results. Ultimately exposing the organization to Open Source risk that is not easily mitigated.
### Feature name
To ensure common language while discussing this feature internally and externally we will call this "Snippet Detection."
### Intended users
[Amy (Application Security Engineer)](https://handbook.gitlab.com/handbook/product/personas/#amy-application-security-engineer)
[Delaney (Development Lead)](https://handbook.gitlab.com/handbook/product/personas/#delaney-development-team-lead)
[Sasha (Software Engineer)](https://handbook.gitlab.com/handbook/product/personas/#sasha-software-developer)
### Proposal
#### Configuration
Allow users to turn this on or off for their organization. Allow project level on/off configuration.
TBD: how to administer this: `.gitlab-ci.yml` or other avenue. Product Management is looking for user input as well as technical input.
#### Language support
Goal: detect snippets for all Dependency Scanning supported languages. This is a lofty ambition as each language has nuances that will likely require subject matter expertise and for iteratively developing additional language support.
In the initial rollout (beta) we should provide support for a commonly used language or languages that meet the criteria of gitlab~3207279 customers who have requested this functionality. Initial thinking is we target Java and/or Python, but this should be discussed with customers and engineering / research teams.
In GA we will add another subset of languages.
#### Scan results
Show named dependency in the Dependency list with an indicator to designate that this was a detected snippet and did not originate from our analysis of a manifest file.
Show any vulnerabilities associated with the detected snippet in the Vulnerability report. We should show the file / line number of the detected snippet (screenshot below). We should also add an indicator to the VR to easily categorize the Vulnerability as originating from a snippet.
{width="454" height="126"}
gitlab~10690742 will provide the data necessary to surface this information to the user. ~"group::security insights" will propagate this information to users.
Users should be able to filter to show Dependencies in the Dependency List and Vulnerabilities in the Vulnerability Report that are from snippets.
_Note: these results should be categorized as "Dependency Scanning results." We should not add another scanner type. Even though we may develop a distinct scanning mechanism for this, users want this abstracted away. They would expect this as part of their Dependency Scanning results. Broadly speaking this is a feature of the GitLab SCA tool._
#### Reporting
In reports accessible to users (UI export or GraphQL response) users should see:
* Snippet flag (True/False)
* File that contains the snippet
* Line number where the snippet starts (similar to a SAST result)
#### Policies
Users should be able to define policies that deal with snippets containing either licenses or vulnerabilities. They should be able to block Merge Requests that contain snippets to ensure there is complete control of what is being merged into a codebase.
---
### User experience goal
This will largely depend on our technical implementation. Some initial thoughts:
#### Configuration
This feature will be a part of Dependency Scanning. Users will be required to configure their dependency scanning as we have outlined in the handbook and will receive Snippet detection information post-scan.
Users should be able to turn Snippet Detection on for specific projects or their entire organization.
**How:** this is to-be-determined, but there are likely multiple paths:
1. Allow user to turn this on within their `.gitlab-ci.yml`
2. gitlab~36973728 could add a user interface to administer these settings globally and at the project level, which would provide for a better user experience compared to updating the `.gitlab-ci.yml`.
We can assess with customers how they envision enabling / disabling this functionality and decide on an initial and a "final" approach.
#### Scanning
After configuration users will expect their Dependency Scanning to initiate as normal and also include snippet detection.
#### Scan results - Dependency list
Users should see scan results in the Dependency List and Vulnerability report. The goal will to be to provide results as we normally would for a dependency scan, but to add a flag or some other designator to identify that the library was identified via snippet detection, as opposed to it being declared in a `pom.xml` or `lockfile`.
#### Reporting
This information should be available in exports from the GitLab user interface as well as GraphQL. As this feature evolves past beta and we receive feedback on the implementation we may need to develop additional reporting.
---
### Development phases
#### Beta
The goal of the beta will be to:
* Configure dependency scanning to detect snippets for a project or projects
* Allow users to detect snippets for a small subset (one or two) languages via a Dependency Scan. Languages TBD.
* Allow users to see snippets detected in a GraphQL response
* Instrumentation: at a minimum we should be able to determine how many snippets were detected for each customer on a per-project basis.
* Stretch: allow users to see snippets in the UI
#### GA
The goal of the GA will be to:
* Address any bugs / feedback identified in the beta
* Allow users to detect snippets in a larger set of languages. Languages TBD.
* If beta stretch goal is not met, then show the snippets in the UI
* Allow users to define a policy around snippets
* Instrumentation: we should be able to capture all metrics outline in the Metrics section below
#### **Lovable**
* Round out our language support to all languages we can technically support
* Allow for easy configuration in the GitLab user interface.
---
### Availability & Testing
We should perform rigorous performance testing to understand the duration of a snippet detection analysis. If snippet detection causes long-running scan times then we should document this, but also consider how the process runs in conjunction with Dependency Scanning. The thinking here is that Dependency Scanning is a relatively fast process to run, so we do not want to hamper DS scan results.
### Available Tier
gitlab~3207279
### Feature Usage Metrics
* Snippets Found (Count)
* Snippets Identified with Vulns, grouped by CVSS Severity
* Snippets matched with licenses
* Snippets Replaced (Count): Snippet detection identified a dependency that was open-source, the snippet was removed and then the dependency was declared in a manifest
### What does success look like, and how can we measure that?
Snippet is detected, triaged, and then a remediation is applied.
### What is the type of buyer?
Technical. Someone from the CISO or CTO group that is concerned about developers pulling in code snippets that are actually open-source.
### Is this a cross-stage feature?
Yes.
* gitlab~10690742 will need to develop some of this core functionality
* gitlab~13493138 will need to help with research of how to detect language-specific snippets and provide information on the nuances of the various languages we intend to support.
* gitlab~38096065 will need to add this information to the UI / reporting
* gitlab~36973728 will need to assist with configuration paths
* gitlab~10690753 will need to add snippet rules as a policy definition available to users
### What is the competitive advantage or differentiation for this feature?
This will be a competitive differentiator for us. There is [Open Source tooling](https://www.scanoss.com/) that provides these types of insights. This will help us displace point solution SCA players in deals.
### Links / references
<!--triage-serverless v3 PLEASE DO NOT REMOVE THIS SECTION-->
_This page may contain information related to upcoming products, features and functionality. It is important to note that the information presented is for informational purposes only, so please do not rely on the information for purchasing or planning purposes. Just like with all projects, the items mentioned on the page are subject to change or delay, and the development, release, and timing of any products, features, or functionality remain at the sole discretion of GitLab Inc._
<!--triage-serverless v3 PLEASE DO NOT REMOVE THIS SECTION-->
<!--triage-serverless v3 PLEASE DO NOT REMOVE THIS SECTION-->
_This page may contain information related to upcoming products, features and functionality. It is important to note that the information presented is for informational purposes only, so please do not rely on the information for purchasing or planning purposes. Just like with all projects, the items mentioned on the page are subject to change or delay, and the development, release, and timing of any products, features, or functionality remain at the sole discretion of GitLab Inc._
<!--triage-serverless v3 PLEASE DO NOT REMOVE THIS SECTION-->
<!--triage-serverless v3 PLEASE DO NOT REMOVE THIS SECTION-->
_This page may contain information related to upcoming products, features and functionality. It is important to note that the information presented is for informational purposes only, so please do not rely on the information for purchasing or planning purposes. Just like with all projects, the items mentioned on the page are subject to change or delay, and the development, release, and timing of any products, features, or functionality remain at the sole discretion of GitLab Inc._
<!--triage-serverless v3 PLEASE DO NOT REMOVE THIS SECTION-->
<!--triage-serverless v3 PLEASE DO NOT REMOVE THIS SECTION-->
_This page may contain information related to upcoming products, features and functionality. It is important to note that the information presented is for informational purposes only, so please do not rely on the information for purchasing or planning purposes. Just like with all projects, the items mentioned on the page are subject to change or delay, and the development, release, and timing of any products, features, or functionality remain at the sole discretion of GitLab Inc._
<!--triage-serverless v3 PLEASE DO NOT REMOVE THIS SECTION-->
<!--triage-serverless v3 PLEASE DO NOT REMOVE THIS SECTION-->
> [!important]
>
> This page may contain information related to upcoming products, features and functionality. It is important to note that the information presented is for informational purposes only, so please do not rely on the information for purchasing or planning purposes. Just like with all projects, the items mentioned on the page are subject to change or delay, and the development, release, and timing of any products, features, or functionality remain at the sole discretion of GitLab Inc.
<!--triage-serverless v3 PLEASE DO NOT REMOVE THIS SECTION-->
epic