Feature Proposal: Snippet detection for Open Source Dependencies
Problem to solve
In software development it is common for software engineers to research the underlying problem they are attempting to solve to implement a feature or fix a bug. Engineers and organizations have initiated efforts to drive efficiency by adopting AI development tools. Notably, GitLab Duo, GitHub Copilot, ChatGPT, et. al. These tools help developers answer questions about their codebase, while also providing code snippets to solve common problems faced during software development. Traditional methods of research, vis-a-vis, Google is another point of origination for code snippets.
The problem with adding code snippets into a codebase is the lack of insight on the provenance of the provided solution by the AI tool/research source (Google). Developers can very easily copy code and add it to their project, seemingly as first-party code. In some cases, this is perfectly acceptable. However, there may be instances where that code snippet is in fact code from an Open Source Dependency. This opens the organization to two types of risk: license and vulnerability.
For License Risk, the user may unknowingly pull in code that is protected by a non-permissive license, which could expose their organization to legal risk. In this instance a GitLab customer may be forced to open-source the software that they were building. This is a particularly large threat for customers who play in the embedded device space (auto, manufacturing, med-devices).
For Vulnerability Risk, a user may copy code from a open source dependency that has vulnerabilities associated with it. Because these dependencies are not declared in a pom.xml or lockfile a Dependency Scan will not flag these in scan results. Ultimately exposing the organization to Open Source risk that is not easily mitigated.
Intended users
Amy (Application Security Engineer)
User experience goal
This will largely depend on our technical implementation. Some initial thoughts:
Configuration
This feature will be a part of Dependency Scanning. Users will be required to configure their dependency scanning as we have outlined in the handbook and will receive Snippet detection information post-scan.
Users should be able to turn Snippet Detection on for specific projects or their entire organization.
How: this is to-be-determined, but there are likely multiple paths:
- Allow user to turn this on within their
.gitlab-ci.yml -
groupsecurity platform management could add a user interface to administer these settings globally and at the project level, which would provide for a better user experience compared to updating the
.gitlab-ci.yml.
We can assess with customers how they envision enabling / disabling this functionality and decide on an initial and a "final" approach.
Scanning
After configuration users will expect their Dependency Scanning to initiate as normal and also include snippet detection information in their scan results.
Scan results - Dependency list
The goal will to be to provide results as we normally would for a dependency scan, but to add a flag or some other designator to identify that the library was identified via snippet detection, as opposed to it being declared in a pom.xml or lockfile. This information would be present in the Dependency List.
Scan results - Vulnerability report
Users should receive insight on the vulnerabilities associated with the snippet that was added to their code. For this, the user will see the corresponding vulnerability in the Vulnerability Report - an identifier should also be present here. This will allow for easier triage to prevent the user from having to navigate through their code to try and find the library, when it was actually copied into their project, seemingly as first-party code. Depending on the functionality we decide to develop, it would be ideal if we could provide file / line number-level information. This could be similar to how SAST provides vulnerability location to their users (see screenshot below):
Reporting
This information should be available in exports from the GitLab user interface as well as GraphQL. As this feature evolves past beta and we receive feedback on the implementation we may need to develop additional reporting.
Policies
Users should be able to define MR approval policies that include parameters for DS scan results that include snippets. Users should be able to block MRs they deem unacceptable based on snippets, either in terms of vulnerability severity or a blocked license.
Proposal
Configuration
Allow users to turn this on or off for their organization. Allow project level on/off configuration.
TBD: how to administer this: .gitlab-ci.yml or other avenue. Product Management is looking for user input as well as technical input.
Language support
Goal: detect snippets for all Dependency Scanning supported languages. This is a lofty ambition as each language has nuances that will likely require subject matter expertise and for iteratively developing additional language support.
In the initial rollout (beta) we should provide support for a commonly used language or languages that meet the criteria of GitLab Ultimate customers who have requested this functionality. Initial thinking is we target Java and/or Python, but this should be discussed with customers and engineering / research teams.
In GA we will add another subset of languages.
Scan results
Show named dependency in the Dependency list with an indicator to designate that this was a detected snippet and did not originate from our analysis of a manifest file.
Show any vulnerabilities associated with the detected snippet in the Vulnerability report. We should show the file / line number of the detected snippet. We should also add an indicator to the VR to easily categorize the Vulnerability as originating from a snippet.
groupcomposition analysis will provide the data necessary to surface this information to the user. groupsecurity insights will propagate this information to users.
Users should be able to filter to show Dependencies in the Dependency List and Vulnerabilities in the Vulnerability Report that are from snippets.
Reporting
In reports accessible to users (UI export or GraphQL response) users should see:
- Snippet flag (True/False)
- File that contains the snippet
- Line number where the snippet starts (similar to a SAST result)
Availability & Testing
We should perform rigorous performance testing to understand the duration of a snippet detection analysis. If snippet detection causes long-running scan times then we should document this, but also consider how the process runs in conjunction with Dependency Scanning. The thinking here is that Dependency Scanning is a relatively fast process to run, so we do not want to hamper DS scan results.
Available Tier
Feature Usage Metrics
- Snippets Found (Count)
- Snippets Identified with Vulns, grouped by CVSS Severity
- Snippets matched with licenses
- Snippets Replaced (Count): Snippet detection identified a dependency that was open-source, the snippet was removed and then the dependency was declared in a manifest
What does success look like, and how can we measure that?
Snippet is detected, triaged, and then a remediation is applied.
What is the type of buyer?
Technical. Someone from the CISO or CTO group that is concerned about developers pulling in code snippets that are actually open-source.
Is this a cross-stage feature?
Yes.
- groupcomposition analysis will need to develop some of this core functionality
- groupvulnerability research will need to help with research of how to detect language-specific snippets and provide information on the nuances of the various languages we intend to support.
- groupsecurity insights will need to add this information to the UI / reporting
- groupsecurity platform management will need to assist with configuration paths
- groupsecurity policies will need to add snippet rules as a policy definition available to users
What is the competitive advantage or differentiation for this feature?
This will be a competitive differentiator for us. There is Open Source tooling that provides these types of insights. This will help us displace point solution SCA players in deals.
Links / references
This page may contain information related to upcoming products, features and functionality. It is important to note that the information presented is for informational purposes only, so please do not rely on the information for purchasing or planning purposes. Just like with all projects, the items mentioned on the page are subject to change or delay, and the development, release, and timing of any products, features, or functionality remain at the sole discretion of GitLab Inc.
This page may contain information related to upcoming products, features and functionality. It is important to note that the information presented is for informational purposes only, so please do not rely on the information for purchasing or planning purposes. Just like with all projects, the items mentioned on the page are subject to change or delay, and the development, release, and timing of any products, features, or functionality remain at the sole discretion of GitLab Inc.
This page may contain information related to upcoming products, features and functionality. It is important to note that the information presented is for informational purposes only, so please do not rely on the information for purchasing or planning purposes. Just like with all projects, the items mentioned on the page are subject to change or delay, and the development, release, and timing of any products, features, or functionality remain at the sole discretion of GitLab Inc.
This page may contain information related to upcoming products, features and functionality. It is important to note that the information presented is for informational purposes only, so please do not rely on the information for purchasing or planning purposes. Just like with all projects, the items mentioned on the page are subject to change or delay, and the development, release, and timing of any products, features, or functionality remain at the sole discretion of GitLab Inc.
