Disable or remove DS Scan on SBOM report after pipeline completion (Beta behavior)
<!-- This template is a great use for issues that are feature::additions or technical tasks for larger issues.--> ### Problem to solve The original implementation of the DS using SBOM feature - and what shipped for the Beta - uses background processing in the rails application to execute the vulnerability scan of the uploaded SBOM reports, after pipeline completion. However, after receiving user feedback we re-evaluated this approach and decided to re-instate the DS security report generation in the DS CI job: https://gitlab.com/gitlab-org/gitlab/-/issues/521769+ As we deliver the GA feature, we now no longer need the beta behavior, that currently duplicates the work. Considering early adoption of the feature its Beta stage, we must ensure that removing this capability will not break existing workflows. ### Proposal Since we had several discussions and diverging opinions on whether or not to keep this capability, let's review the options and compare them: 1. Remove the beta behavior entirely. The only way to do dependency scanning using SBOM in the CI pipeline is via the DS analyzer and the SBOM Scan API. 2. Disable the beta behavior only when it's unecessary. We keep this capability as a fallback when SBOM Scan API fails or when customers uses custom jobs. #### Why should we remove it? Removing the beta behavior avoids fragmentation of UX and implementation, which provides several advantages. 1. simplified UX for customers and Gitlab team members to understand the feature 1. single implementation path to maintain and debug 1. single implementation for monitoring, usage metrics, caching, etc. 1. opportunity to benefit from all improvements made on the generic process by other Sec groups (e.g. metrics in reports) 1. clean decoupling between vulnerability scanning (AST) and results processing (SRM) Additionally, in its current implementation, DS on SBOM scan is slowing down the processing of all security reports by doing the scan of SBOM reports while parsing all security reports. This poses serious problems of separation of concerns and error handling. As seen recently, that tight coupling is causing a lot of headache in understanding and debuging problems (https://gitlab.com/gitlab-org/gitlab/-/issues/577331). It's also mixing different domain logic into the same piece of code, where multiple teams can work at the same time, increasing risk of conflicts and regressions. I also believe our efforts would be better placed in supporting other scanning contexts than maintaining different implementations for the same context. #### Why should we keep it? ##### 1 - It offers a safety net when the SBOM scan API fails This is a valid argument for Limited Availability and offering some stability to customers until we gain confidence on the new implementation. However, this is not a sustainable approach to development. We have a lot of features that rely on API and background processing without a fallback. Ultimately, the feature design should provide enough reliability to sustain nominal usages, not to cover for outage situations. ##### 2 - It can be reused for 3rd party SBOM support 3rd party SBOM support could be provided via the DS analyzer too. Similarly to situations where a preceding build job provides a scannable artifact (lockfile or graph file), the DS analyzer could take a custom SBOM as an input document. At least for use cases that fit in the CI pipeline workflow. If we want to start supporting SBOM scan in different scanning contexts, the beta solution would not be suitable anyway, as it is tightly coupled with report ingestion logic from CI pipeline workflow. ##### 3 - Some customers might already be using it Customers who onboarded early on the DS analyzer and uses a custom CI job implementation that doesn't declare the DS security report artifact will indeed be impacted. In such situation, the SBOM scan would happen in the CI job but scan results would not be uploaded as report artifact and no vulnerability scan would happen on SBOM ingestion. Mitigations: 1. Identify such usage (can we?), track it in metrics and warn user 1. Offer an opt-in toggle to re-enable it if a customer depends on it (even temporarily until they adjust their CI configuration) 1. Ultimately, and while unfortunate, [Beta feature](https://docs.gitlab.com/policy/development_stages_support/#beta) can face breaking changes: "breaking changes can occur outside of major releases or with less notice than for generally available features." ##### 4 - It will be usefull to ingest vulnerabilities directly from CycloneDX document CycloneDX is a comprehensive spec, with everything from licenses to vulnerabilities included and we could support ingesting vulnerabilities directly from this document with: https://gitlab.com/gitlab-org/gitlab/-/issues/435436 Ingesting vulnerabilities directly provided in the CycloneDX document is a slightly different workflow than the current feature though. It's rather a direct content parsing (what's provided in `vulnerabilities` properties of the SBOM) than a dynamic DS scan of the components found in the SBOM. This could be implemented differently than by faking a DS report. It will also possibly bring components from other purl types and maybe lead to the creation of CS vulnerabilities too. However, it is true that supporting this use case will require some of the current logic to still be present, like what we have to fetch "Security CI job" which include cycloneDX reports. Indeed, it is only at parsing time that we can verify if they contain vulnerabilities or not. This also creates additional complexity and edge cases to handle around deduplication/merge of findings. This still leaves a lot of room for us to proceed with the clean up of the current logic around "let's consider CycloneDX component list as a DS scan report" without closing the door to https://gitlab.com/gitlab-org/gitlab/-/issues/435436. ### Implementation plan While the complete removal still makes sense, the adopted strategy for the short term is to simply disable the Beta behavior when a DS security report is present in a given CI job. This is mainly motiviated by the following factors: 1. the riks of negative side effects for some customers who might have already adopted this behavior without relying on the DS analyzer or with such a customization that it won't automatically generate the DS report when we enable the FF by default. Numbers collected on gitlab.com (see [this private note](https://gitlab.com/gitlab-org/gitlab/-/work_items/546429#note_2879845590)) are still unclear and concerning. 2. the timeline to deliver GA doesn't allow for a fully fledged removal process of a beta feature (communication is still advised even if a breaking change is allowed). ### Workflow overview These diagrams show the 4 major steps of the workflow and how their internal logic is modified by this change. #### Before ```mermaid %%{init: {'theme': 'base', 'themeVariables': { 'primaryColor':'#ffffff', 'primaryTextColor':'#000000', 'primaryBorderColor':'#333333', 'lineColor':'#333333', 'secondBkgColor':'#f0f0f0', 'tertiaryColor':'#ffffff', 'tertiaryTextColor':'#000000', 'tertiaryBorderColor':'#333333'}}}%% graph TD subgraph ReportParsing["1️⃣ Report Parsing"] direction LR A["Fetch all DS and SBOM report artifacts in the pipeline hierarchy"] B{CycloneDX ?} D["parsed findings from raw DS report"] E["store findings in a Security::Scan (one per CI job)"] C1["parse SBOM report"] C2["DS scan on SBOM components"] A --> B B --> |No|D B --> |Yes|C1 C1 --> C2 C2 --> E D --> E end subgraph VulnIngestion["2️⃣ Vulnerability Ingestion"] direction LR F["fetch findings from Security::Scan"] G{"scanner == GitLab SBOM<br/>Vulnerability Scanner?"} H["❌ Skip ingestion"] I["create Vulnerabilities<br/>on default branch"] F --> G G -->|Yes| H G -->|No| I end subgraph SbomIngestion["3️⃣ SBOM Ingestion"] direction LR J["Parse and store<br/>SBOM components"] K["Emit SbomIngestedEvent"] J --> K end subgraph DsScanAfter["4️⃣ SBOM Scan After Ingestion"] direction LR L["parse SBOM report"] M["DS scan on SBOM components"] N["create Vulnerabilities<br/>on default branch"] L --> M --> N end ReportParsing --> VulnIngestion VulnIngestion --> SbomIngestion SbomIngestion --> DsScanAfter style ReportParsing fill:#e7f5ff,stroke:#333333,color:#000000 style VulnIngestion fill:#e7f5ff,stroke:#333333,color:#000000 style SbomIngestion fill:#e7f5ff,stroke:#333333,color:#000000 style DsScanAfter fill:#e7f5ff,stroke:#333333,color:#000000 ``` #### After ```mermaid %%{init: {'theme': 'base', 'themeVariables': { 'primaryColor':'#ffffff', 'primaryTextColor':'#000000', 'primaryBorderColor':'#333333', 'lineColor':'#333333', 'secondBkgColor':'#f0f0f0', 'tertiaryColor':'#ffffff', 'tertiaryTextColor':'#000000', 'tertiaryBorderColor':'#333333'}}}%% graph TD subgraph ReportParsing["1️⃣ Report Parsing"] direction LR A["Fetch all DS and SBOM report artifacts in the pipeline hierarchy"] B{CycloneDX ?} D["parsed findings from raw DS report"] E["store findings in a Security::Scan (one per CI job)"] C1["parse SBOM report(s)"] C2["DS scan on SBOM components"] FF1{DS report<br/>exists for thic CI job?} SKIP1["❌ Skip SBOM report(s)"] A --> B B --> |No|D B --> |Yes|FF1 FF1 --> |Yes|SKIP1 FF1 --> |No|C1 C1 --> C2 C2 --> E D --> E end subgraph VulnIngestion["2️⃣ Vulnerability Ingestion"] direction LR F["fetch findings from Security::Scan"] G["create Vulnerabilities<br/>on default branch (including all DS findings)"] F --> G end subgraph SbomIngestion["3️⃣ SBOM Ingestion"] direction LR J["Parse and store<br/>SBOM components"] K["Emit SbomIngestedEvent"] J --> K end subgraph DsScanAfter["4️⃣ SBOM Scan After Ingestion"] direction LR L["parse SBOM report"] M["DS scan on SBOM components"] N["create Vulnerabilities<br/>on default branch"] FF3{SBOM type == dependency_sanning} SKIP2["❌ Skip SBOM scan"] L --> FF3 FF3 --> |Yes|SKIP2 FF3 --> |No|M M --> N end ReportParsing --> VulnIngestion VulnIngestion --> SbomIngestion SbomIngestion --> DsScanAfter style SKIP1 fill:#ffd43b,stroke:#333333,color:#000000 style SKIP2 fill:#ffd43b,stroke:#333333,color:#000000 style G fill:#51cf66,stroke:#333333,color:#000000 style N fill:#51cf66,stroke:#333333,color:#000000 style ReportParsing fill:#e7f5ff,stroke:#333333,color:#000000 style VulnIngestion fill:#e7f5ff,stroke:#333333,color:#000000 style SbomIngestion fill:#f0f9ff,stroke:#333333,color:#000000 style DsScanAfter fill:#f0f9ff,stroke:#333333,color:#000000 ``` #### Tasks - [ ] Modify the ingestion logic to detect when a CI job has a DS report - [ ] Skip DS scan on SBOM report when a DS report is present - [ ] Fix DS finding ingestion when generated from an SBOM report - [ ] Always include DS findings from GitLab SBOM Vulnerability Scanner - [ ] Skip DS scan after SBOM ingestion on the default branch (already ingested with generic process) #### Feature Flag Strategy The change is gated behind the feature flag `disable_ds_on_sbom_report` and support all use cases: * single or multiple jobs in whole pipeline hierarchy with DS report only * single or multiple jobs in whole pipeline hierarchy with SBOM report(s) only * single or multiple jobs in whole pipeline hierarchy with both DS and SBOM report(s) * a mix of some jobs with DS job only and some jobs with SBOM report only and/or both report types. This approach provides: * **Gradual rollout**: Control when the new behavior is enabled per project/group on gitlab.com for validation * **Backward compatibility**: Projects using the new DS analyzer without a successful DS report generation will fallback to the Beta behavior using the dynamic DS scan on SBOM.
issue