Bring security scan results back into the Dependency Scanning CI job
### Problem to solve
Following the release of [the Dependency Scanning using SBOM Beta feature](https://gitlab.com/groups/gitlab-org/-/epics/15960), we have received feedback from customers that pushed us to reconsider the behavior of this feature.
One of the most impactful change was around the lack of security scan results within the context of a running pipeline. The new implementation that relies on doing the DS scan in the GitLab plaftorm after the pipeline completes breaks several custom workflows based on the DS security report artifact being present in the CI job output.
As part of https://gitlab.com/gitlab-org/gitlab/-/issues/521769+ the team has considered several options to avoid that disruption for customers. The decision taken is to re-implement the generation of Dependency Scanning security report in the new DS CI job, and upload these artifacts in the rails platform to continue using the generic security report ingestion process.
### Further details
Workflows based on the presence of security scan results within the CI job were never recognized as officially supported and thus were not considered as requirements for the new feature implementation. As pointed in our [SBOM based scan challenges epic](https://gitlab.com/groups/gitlab-org/-/epics/11617), we decided to accept making breaking changes and incentivize users to adjust their custom workflow to leverage GitLab built-in features. This was before the company decided to further reconsider how breaking changes are detrimental to customers and [focus on minimizing their impact](https://docs.gitlab.com/development/deprecation_guidelines/#minimize-the-impact-of-breaking-changes).
These factors stacked up with others, to eventually outweight the benefits we get from the current direction.
### Proposal
> [!warning]
> Independently from the new design and implementation path we decide to take, the currently available implementation for the Beta feature must stay available and we must avoid breaking it until we have confirmed it's no longer used (or below a defined threshold).
In order to bring security scan results into the Dependency Scanning CI job, we must re-adjust the architecture of Dependency Scanning using SBOM and the involved components.
On the high level, the following task must be executed:
1. Design a new service to execute an on demand Dependency Scanning analysis without tying its results to any state in the vulnerability management system. This must reuse the GitLab SBOM Vulnerability Scanner, so that we maintain a single implementation of our scanner logic.
2. (optional) Refactor the existing DS using SBOM feature implementation in the backend to use that service, yet store the provided results as expected in this workflow.
3. Design an API that can be called from a running CI job, to send a list of dependencies to be scanned for vulnerabilities and get these results.
4. Design the necessary changes to the new Dependency Scanning CI job to leverage that API
Most of these changes could be completed in many different ways. To get the most out of this direction change, we should also consider how this could help supporting other workflows in the long term. For instance, such an on-demand analyis service could be re-used to power Dependency Scanning in IDE or Web IDE.
#### Workflow
Here is a sequence diagram describing the overall workflow we expect to implement. It is voluntarily vague for now as implementation details may impact actors, their role, and the data exchanged. Still, it is already making some assumptions:
- we use the rails backend for the API
- we use the sidekiq jobs to execute the on-demand SBOM scan
- the API workflow must be async, and we use a polling approach
- the results are stored temporarily in a location directly accessible to the CI job
```mermaid
sequenceDiagram
participant DS as CI pipeline
participant RAILS as GitLab API
participant BG as Sidekiq Job
participant DB as Storage
DS->>DS: Dependency Detection
DS->>RAILS: Send detected dependencies
RAILS->>BG: Schedule on-demand DS scan
RAILS->>DS: Returns 202 and polling url
DS->>DS: Wait X seconds
DS->>RAILS: Polls scan results
RAILS->>DB: Check scan results
DB->>RAILS: No results available
RAILS->>DS: Returns "in progress"
DS->>DS: Wait X seconds
BG->>BG: On-demand DS scan execution
BG->>DB: Store scan results
DS->>RAILS: Polls scan results
RAILS->>DB: Check scan results
DB->>RAILS: Returns scan results available
RAILS->>DS: Returns 303 with results url
DS->>DB: Fetch scan results
DB->>DS: Returns scan results
DS->>DS: Finish execution
DS->>DB: Uploads SBOM and DS report artifacts as usual
```
### High level questions on the design
Before diving into the specifics of how we want to implement the changes to the impacted components, we should define the hollistic vision of this new approach. By asking ourselves these questions and weighing pros and cons of various options and their side-effects on the complete solution, we aim to find the most relevant path forward.
The listed pros and cons are certainly open to debate and will be adjusted as the discussion moves forward.
1. What input format to use for the API?
1. a single SBOM document
- {+ boring solution, straighforward +}
- {+ simplify results processing by deferring on context (e.g aggregation/merge/dedupe) +}
3. a collection of SBOM documents
- {+ gives full context to SBOM scanner, if it needs it (it does not so far) +}
- {- scalability (size limit for multiple SBOM documents in a single request, processing time, etc.) -}
4. a collection of components (structured data or PURL)
- {+ abstract the source of the components and simplify API (though currently the API is in the rails backend which already has SBOM parsing logic so :shrug:) +}
- {- increase opportunity to reuse the same API for various scanning contexts -}
- {- requires component extraction logic in the caller (duplicate logic) -}
1. What input format should we use for the generic on-demand DS service?
1. a single or a collection of SBOM documents (if the API provides it)
1. a collection of components (structured data or PURL).
1. What output format to use for the API? (this is also tied to item 8)
1. a collection of findings (e.g. similar to existing APIs)
- {+ straightforward no additional formatting required +}
- {- ties us with another part of the product that can evolve separately -}
- {- no versioning available -}
1. a single or a collection of DS security reports (json)
- {+ versioning available +}
- {+ allows client to directly use the API output +}
- {- -}
1. an SBOM document (enrich the supplied file)
- {+ don't introduce a new file format and rely on existing Cyclonedx spec for versioning +}
- {+ invest for future needs to output vulnerabilities within SBOM document +}
- {+ allow to only store a single file in the temporary model +}
- {- possibly increase traffic for big sbom docs (we don't need the full content for this feature) -}
1. Where to implement the API?
1. rails backend
- {+ boring solution, straighforward +}
1. ~~external service~~
- {- runway is not yet ready for all deployment types -}
1. What type of API? TODO: check compatiblity, authentication, etc.
1. internal API: https://docs.gitlab.com/development/internal_api/
- {- this API requires a particular authentication method (https://docs.gitlab.com/development/internal_api/#authentication) that sounds incompatible with the CI job context (access to secret file) -}
3. public rest API: https://docs.gitlab.com/api/rest/
- {+ can use a CI job token to authenticate request. +}
- {+ dedicated endpoint with minimal footprint on the codebase +}
5. publich GraphQL API: https://docs.gitlab.com/api/graphql/
- {- Direct upload for GraphQL is not available yet (https://gitlab.com/gitlab-org/gitlab/-/issues/280819) -}
1. Where to implement the on demand service?
1. rails backend (sidekick job)
- {+ boring solution, straighforward +}
- {+ similar to the existing implementation and how Continus Vulnerability Scanning is done too +}
1. ~~external service~~
- {- runway is not yet ready for all deployment types -}
1. Where to implement the changes to call the on-demand DS service from the CI pipeline?
1. directly in the new DS analyzer (Go code)
- {+ fully integrated for improved flow management +}
- {+ makes the name Depedency Scanning more relevant for the analyzer +}
1. ~~in the DS job with a different tool~~ (the spike favored adding logic to the analyzer: https://gitlab.com/gitlab-org/gitlab/-/issues/525958#project-organization)
- {+ no impact on the existing DS analyzer project +}
- {- the tool must still be present in the DS analyzer image -}
1. ~~in a different CI job (decouple dependency detection and SBOM generation from security analysis)~~ (the spike favored using a single CI job: https://gitlab.com/gitlab-org/gitlab/-/issues/525958#ci-jobs-organization-1)
- {+ no impact on the existing DS analyzer project +}
- {+ improved UX for fine grained usage of our SBOM based features+}
- {- increases complexity, scope and unkowns -}
- {- new project and new image to manage and secure -}
1. Where to implement the DS security report generation?
1. directly in the new DS analyzer (Go code)
- {+ leverages the existing "report" Go module we share with other AST groups +}
- {+ makes the API response more generic and allows to use a single SBOM document approach +}
- {+ fully integrated in the analyzer for improved flow management +}
- {+ closer to most of the existing analyzer implementations +}
1. ~~in the DS job with a different tool~~ (the spike favored adding logic to the analyzer: https://gitlab.com/gitlab-org/gitlab/-/issues/525958#project-organization)
- {+ no impact on the existing DS analyzer project +}
- {+ makes the API response more generic and allows to use a single SBOM document approach +}
- {- the tool must still be present in the DS analyzer image -}
- {- the tool must be called from the CI job script, impacting users with custom DS jobs -}
1. in the backend service, the API response being a single or a collection of json reports
- {+ simplifies the implementation in the CI job +}
- {- requires to re-implement report generation (there are already some POROs that may help) -}
- {- requires an API that can take all SBOMs at once to generate a fully merged DS report for a CI job, or implement merge logic in the analyzer -}
1. ~~in a different CI job (decouple dependency detection and SBOM generation for security analysis)~~ (the spike favored using a single CI job: https://gitlab.com/gitlab-org/gitlab/-/issues/525958#ci-jobs-organization-1)
- {- increases complexity, scope and unkowns -}
### Other random questions
1. What value should we use for the analyzer and scanner names in the DS report? See [Scanner](https://docs.gitlab.com/user/application_security/terminology/#scanner) vs [Analyzer](https://docs.gitlab.com/user/application_security/terminology/#analyzer) meaning in our glossary.
1. Using `GitLab SBOM Vulnerability Scanner` for the Scanner might make sense and stays aligned with what the current DS using SBOM implementation uses. Note that this value is critical for some downstream features like "marking vulnerabilities as no longer detected" (this has caused some problems due to not being able to distinguish vulnerabilities created by CVS from these coming from pipelne workflow).
1. Using `GitLab Dependency Scanning` for the Analyzer? TODO: verify how the analyzer properties might influence other features and processes.
epic