Rollout of dependency_scanning_sbom_scan_result_cache feature flag

Summary

This issue is to roll out SBOM Scan result caching on production, that is currently behind the dependency_scanning_sbom_scan_result_cache feature flag.

The feature enables the Dependency Scanning analyzer to check if an SBOM has been previously processed before uploading and triggering a new resource-intensive scan, significantly reducing computational resources and scan times.

Owners

Most appropriate Slack channel to reach out to: #g_secure_composition-analysis
Best individual to reach out to: @ifrenkel

Expectations

What are we expecting to happen?

Functional Behavior:

Analyzer generates a digest of SBOM components before upload
Analyzer calls caching endpoint POST /api/v4/jobs/:job_id/sbom_scans/:sbom_digest
Instance returns 201 with sbom_scan_id when cached result exists and advisories are fresh
Instance returns 404 when no cache exists or new advisories have been published
Analyzer skips upload and proceeds to result fetching when receiving 201
Multiple scans can safely reference the same SbomScanResult

Performance Improvements:

Cache hit rate > 50% for projects with repeated pipelines
Reduction in SBOM scan processing queue size
Reduction in average scan completion time
Fewer redundant uploads and scans for identical SBOMs

Data Integrity:

Same number of SbomScan records created as before (no table growth)
Orphaned SbomScanResult records properly cleaned up
Advisory freshness validated via PackageMetadata::Checkpoint

What can go wrong and how would we detect it?

Risk	Detection Method	Mitigation
Ineffective caching	Monitor cache hit/miss rates in SBOM Scan API Dashboard	Verify digest consistency across purl types
DB growth	Monitor `sbom_vulnerability_scans` and `sbom_vulnerability_scan_results` table sizes	Ensure same number of records as old code path
Stale results served	Monitor advisory update events vs cache misses; UAT testing	Explicit freshness check via `PackageMetadata::Checkpoint`
SBOM digest inconsistency	Monitor cache miss rates by purl type	Digest is versioned (sha256v1); worst case is cache miss
Wrong scan result served	Monitor errors fetching scan results on analyzer side	Dual read support during rollout
Concurrent requests	Some duplicate scans expected	Acceptable edge case; still reduces overall processing
Increased error rates	Monitor API endpoint errors, timeouts, background job failures	Gradual rollout with monitoring between steps

Key Metrics to Monitor:

Cache hit/miss rate per project and by purl type
API endpoint response times (POST /api/v4/jobs/:job_id/sbom_scans/:sbom_digest)
Table growth rates for sbom_vulnerability_scans and sbom_vulnerability_scan_results
Error rates on new endpoint and result fetching
Background job failures and timeout rates
Orphaned scan result cleanup events

Dashboard: SBOM Scan API Usage Dashboard (internal)

Rollout Context

This feature follows a phased rollout plan due to the migration from SbomScan.result_file to SbomScan.result association:

Stage 1 (18.8): Dual read support deployed (completed) Stage 2a (18.10): Selective enablement (this issue - current stage) Stage 2b (18.10): Global enablement Stage 3 (18.11): Cleanup legacy code path (tracked in #582203)

Because SbomScan records expire after 2 days, we support both storage locations during rollout to avoid data migration.

Rollout Steps

Note: Please make sure to run the chatops commands in the Slack channel that gets impacted by the command.

Rollout on non-production environments

Verify the MR with the feature flag is merged to master and has been deployed to non-production environments with /chatops run auto_deploy status <merge-commit-of-your-feature>
Deploy the feature flag at a percentage (recommended percentage: 50%) with /chatops run feature set dependency_scanning_sbom_scan_result_cache 50 --actors --dev --pre --staging --staging-ref
Monitor that the error rates did not increase (repeat with a different percentage as necessary).
Enable the feature globally on non-production environments with /chatops run feature set dependency_scanning_sbom_scan_result_cache true --dev --pre --staging --staging-ref
Verify that the feature works as expected. The best environment to validate the feature in is staging-canary as this is the first environment deployed to. Make sure you are configured to use canary.
If the feature flag causes end-to-end tests to fail, disable the feature flag on staging to avoid blocking deployments.
- See #e2e-run-staging Slack channel and look for the following messages:
  - test kicked off: Feature flag dependency_scanning_sbom_scan_result_cache has been set to true on **gstg**
  - test result: This pipeline was triggered due to toggling of dependency_scanning_sbom_scan_result_cache feature flag

If you encounter end-to-end test failures and are unable to diagnose them, you may reach out to the #s_developer_experience Slack channel for assistance. Note that end-to-end test failures on staging-ref don't block deployments.

Before production rollout

Announce in #whats-happening-at-gitlab about the SBOM scan caching rollout.

Specific rollout on production (Stage 2a: gitlab-org group)

For visibility, all /chatops commands that target production must be executed in the #production Slack channel and cross-posted (with the command results) to #g_secure_composition-analysis.

Preparation before global rollout

Set milestone 18.10 to this rollout issue to signal for enabling the feature flag when it is stable.
Check if the feature flag change needs to be accompanied with a change management issue. Cross link the issue here if it does.
Ensure that you or a representative in development can be available for at least 2 hours after feature flag updates in production. If a different developer will be covering, or an exception is needed, please inform the oncall SRE by using the @sre-oncall Slack alias.
Ensure that documentation exists for the feature, and the version history text has been updated.
Ensure that any breaking changes have been announced following the release post process to ensure GitLab customers are aware.
Notify the #support_gitlab-com Slack channel and #g_secure_composition-analysis channel (more guidance when this is necessary in the dev docs).

Global rollout on production (Stage 2b)

For visibility, all /chatops commands that target production must be executed in the #production Slack channel and cross-posted (with the command results) to #g_secure_composition-analysis.

Stage 3: Cleanup (18.11)

After 100% rollout and minimum 2-day observation period:

Create follow-up issue for removing dual code path (or use #582203 if already exists)
Update code to read only from SbomScan.result association
Remove legacy SbomScan.result_file code path
Schedule removal of result_file column in future milestone

Release the feature

After the feature has been deemed stable, the clean up should be done as soon as possible to permanently enable the feature and reduce complexity in the codebase.

You can either create a follow-up issue for Feature Flag Cleanup or use the checklist below in this same issue.

Create a merge request to remove the dependency_scanning_sbom_scan_result_cache feature flag. Ask for review/approval/merge as usual. The MR should include the following changes:
- Remove all references to the feature flag from the codebase.
- Remove the YAML definitions for the feature from the repository.
Ensure that the cleanup MR has been included in the release package. If the merge request was deployed before the monthly release was tagged, the feature can be officially announced in a release blog post: /chatops run release check <merge-request-url> <milestone>
Close the feature issue to indicate the feature will be released in the current milestone.
Once the cleanup MR has been deployed to production, clean up the feature flag from all environments by running this chatops command in #production channel:

/chatops run feature delete dependency_scanning_sbom_scan_result_cache --dev --pre --staging --staging-ref --production
Close this rollout issue.

Rollback Steps

This feature can be disabled on production by running the following Chatops command in #production:

/chatops run feature set dependency_scanning_sbom_scan_result_cache false
Disable the feature flag on non-production environments:

/chatops run feature set dependency_scanning_sbom_scan_result_cache false --dev --pre --staging --staging-ref
Delete feature flag from all environments:

/chatops run feature delete dependency_scanning_sbom_scan_result_cache --dev --pre --staging --staging-ref --production

Note: When feature flag is disabled, the system falls back to the original code path (always upload and scan). Dual read support ensures no data loss during rollback.

Parent Issue: #562694 Parent Epic: &20326

Implementation MRs:

!213584 (merged) - Database migrations and models
!213586 (merged) - Service and API layer
Pending: Observability events for caching

Analyzer MR: gitlab-org/security-products/analyzers/dependency-scanning!400 (merged)

Related Issues:

#582203 - Remove dual code path for dependency scan results reads
#582073 (closed) - Stop SBOM scan when advisory data has never been synced
#577424 - SBOM Scan API performance tuning on self-managed
#561759 (closed) - Implement soft rate limiting for SBOM Scan Processing

Rollout Timeline

Stage	Milestone	Action	Duration
Stage 1	18.8	Deploy dual read support	Completed
Stage 2a	18.10	Enable for gitlab-org	24-48 hours
Stage 2b	18.10	Enable globally (incremental)	2+ hours + 2 days observation
Stage 3	18.11	Remove legacy code path	After 2+ days