Rollout of dependency_scanning_sbom_scan_result_cache feature flag
Summary
This issue is to roll out SBOM Scan result caching on production,
that is currently behind the dependency_scanning_sbom_scan_result_cache feature flag.
The feature enables the Dependency Scanning analyzer to check if an SBOM has been previously processed before uploading and triggering a new resource-intensive scan, significantly reducing computational resources and scan times.
Owners
- Most appropriate Slack channel to reach out to:
#g_secure_composition-analysis - Best individual to reach out to: @ifrenkel
Expectations
What are we expecting to happen?
Functional Behavior:
- Analyzer generates a digest of SBOM components before upload
- Analyzer calls caching endpoint
POST /api/v4/jobs/:job_id/sbom_scans/:sbom_digest - Instance returns
201withsbom_scan_idwhen cached result exists and advisories are fresh - Instance returns
404when no cache exists or new advisories have been published - Analyzer skips upload and proceeds to result fetching when receiving
201 - Multiple scans can safely reference the same
SbomScanResult
Performance Improvements:
- Cache hit rate > 50% for projects with repeated pipelines
- Reduction in SBOM scan processing queue size
- Reduction in average scan completion time
- Fewer redundant uploads and scans for identical SBOMs
Data Integrity:
- Same number of
SbomScanrecords created as before (no table growth) - Orphaned
SbomScanResultrecords properly cleaned up - Advisory freshness validated via
PackageMetadata::Checkpoint
What can go wrong and how would we detect it?
| Risk | Detection Method | Mitigation |
|---|---|---|
| Ineffective caching | Monitor cache hit/miss rates in SBOM Scan API Dashboard | Verify digest consistency across purl types |
| DB growth | Monitor sbom_vulnerability_scans and sbom_vulnerability_scan_results table sizes |
Ensure same number of records as old code path |
| Stale results served | Monitor advisory update events vs cache misses; UAT testing | Explicit freshness check via PackageMetadata::Checkpoint
|
| SBOM digest inconsistency | Monitor cache miss rates by purl type | Digest is versioned (sha256v1); worst case is cache miss |
| Wrong scan result served | Monitor errors fetching scan results on analyzer side | Dual read support during rollout |
| Concurrent requests | Some duplicate scans expected | Acceptable edge case; still reduces overall processing |
| Increased error rates | Monitor API endpoint errors, timeouts, background job failures | Gradual rollout with monitoring between steps |
Key Metrics to Monitor:
- Cache hit/miss rate per project and by purl type
- API endpoint response times (
POST /api/v4/jobs/:job_id/sbom_scans/:sbom_digest) - Table growth rates for
sbom_vulnerability_scansandsbom_vulnerability_scan_results - Error rates on new endpoint and result fetching
- Background job failures and timeout rates
- Orphaned scan result cleanup events
Dashboard: SBOM Scan API Usage Dashboard (internal)
Rollout Context
This feature follows a phased rollout plan due to the migration from SbomScan.result_file to SbomScan.result association:
Stage 1 (18.8): Dual read support deployed (completed) Stage 2a (18.10): Selective enablement (this issue - current stage) Stage 2b (18.10): Global enablement Stage 3 (18.11): Cleanup legacy code path (tracked in #582203)
Because SbomScan records expire after 2 days, we support both storage locations during rollout to avoid data migration.
Rollout Steps
Note: Please make sure to run the chatops commands in the Slack channel that gets impacted by the command.
Rollout on non-production environments
-
Verify the MR with the feature flag is merged to
masterand has been deployed to non-production environments with/chatops run auto_deploy status <merge-commit-of-your-feature> -
Deploy the feature flag at a percentage (recommended percentage: 50%) with
/chatops run feature set dependency_scanning_sbom_scan_result_cache 50 --actors --dev --pre --staging --staging-ref -
Monitor that the error rates did not increase (repeat with a different percentage as necessary).
-
Enable the feature globally on non-production environments with
/chatops run feature set dependency_scanning_sbom_scan_result_cache true --dev --pre --staging --staging-ref -
Verify that the feature works as expected. The best environment to validate the feature in is
staging-canaryas this is the first environment deployed to. Make sure you are configured to use canary. -
If the feature flag causes end-to-end tests to fail, disable the feature flag on staging to avoid blocking deployments.
- See
#e2e-run-stagingSlack channel and look for the following messages:- test kicked off:
Feature flag dependency_scanning_sbom_scan_result_cache has been set to true on **gstg** - test result:
This pipeline was triggered due to toggling of dependency_scanning_sbom_scan_result_cache feature flag
- test kicked off:
- See
If you encounter end-to-end test failures and are unable to diagnose them, you may reach out to the #s_developer_experience Slack channel for assistance. Note that end-to-end test failures on staging-ref don't block deployments.
Before production rollout
- Announce in #whats-happening-at-gitlab about the SBOM scan caching rollout.
Specific rollout on production (Stage 2a: gitlab-org group)
For visibility, all /chatops commands that target production must be executed in the #production Slack channel
and cross-posted (with the command results) to #g_secure_composition-analysis.
- Ensure that the feature MRs have been deployed to both production and canary with
/chatops run auto_deploy status <merge-commit-of-your-feature> -
Enable for
gitlab-orggroup:/chatops run feature set --group=gitlab-org dependency_scanning_sbom_scan_result_cache true -
Monitor metrics for 24-48 hours:
- Check cache hit/miss rates in dashboard
- Verify no increase in error rates
- Check table growth rates
- Monitor API endpoint response times
- Verify no timeout increases
-
Verify that the feature works for gitlab-org projects:
- Test cache miss scenario (first scan with new SBOM)
- Test cache hit scenario (repeated pipeline with same SBOM)
- Test cache miss with new advisories
- Verify pipeline security tab shows correct results
- Verify vulnerability report shows correct findings
Preparation before global rollout
- Set milestone 18.10 to this rollout issue to signal for enabling the feature flag when it is stable.
- Check if the feature flag change needs to be accompanied with a change management issue. Cross link the issue here if it does.
-
Ensure that you or a representative in development can be available for at least 2 hours after feature flag updates in production.
If a different developer will be covering, or an exception is needed, please inform the oncall SRE by using the
@sre-oncallSlack alias. - Ensure that documentation exists for the feature, and the version history text has been updated.
- Ensure that any breaking changes have been announced following the release post process to ensure GitLab customers are aware.
-
Notify the
#support_gitlab-comSlack channel and#g_secure_composition-analysischannel (more guidance when this is necessary in the dev docs).
Global rollout on production (Stage 2b)
For visibility, all /chatops commands that target production must be executed in the #production Slack channel
and cross-posted (with the command results) to #g_secure_composition-analysis.
-
Incrementally roll out the feature on production.
- Between every step wait for at least 15 minutes and monitor the appropriate graphs on https://dashboards.gitlab.net and SBOM Scan API Dashboard.
-
/chatops run feature set dependency_scanning_sbom_scan_result_cache 25 --actors - Monitor for 30 minutes - check cache effectiveness, error rates, table growth
-
/chatops run feature set dependency_scanning_sbom_scan_result_cache 50 --actors - Monitor for 30 minutes - check cache effectiveness, error rates, table growth
-
/chatops run feature set dependency_scanning_sbom_scan_result_cache 75 --actors - Monitor for 30 minutes - check cache effectiveness, error rates, table growth
-
/chatops run feature set dependency_scanning_sbom_scan_result_cache 100 --actors
- After the feature has been 100% enabled, monitor for at least 2 days (minimum SbomScan TTL) before proceeding to Stage 3 cleanup.
-
Verify success criteria:
- Cache hit rate > 50% for projects with repeated pipelines
- No increase in error rates or timeouts
- Reduction in SBOM scan processing queue size
- Reduction in average scan completion time
- No unexpected table growth
- No stale vulnerability results served
Stage 3: Cleanup (18.11)
After 100% rollout and minimum 2-day observation period:
- Create follow-up issue for removing dual code path (or use #582203 if already exists)
-
Update code to read only from
SbomScan.resultassociation -
Remove legacy
SbomScan.result_filecode path -
Schedule removal of
result_filecolumn in future milestone
Release the feature
After the feature has been deemed stable, the clean up should be done as soon as possible to permanently enable the feature and reduce complexity in the codebase.
You can either create a follow-up issue for Feature Flag Cleanup or use the checklist below in this same issue.
-
Create a merge request to remove the
dependency_scanning_sbom_scan_result_cachefeature flag. Ask for review/approval/merge as usual. The MR should include the following changes:- Remove all references to the feature flag from the codebase.
- Remove the YAML definitions for the feature from the repository.
-
Ensure that the cleanup MR has been included in the release package. If the merge request was deployed before the monthly release was tagged, the feature can be officially announced in a release blog post:
/chatops run release check <merge-request-url> <milestone> -
Close the feature issue to indicate the feature will be released in the current milestone.
-
Once the cleanup MR has been deployed to production, clean up the feature flag from all environments by running this chatops command in
#productionchannel:/chatops run feature delete dependency_scanning_sbom_scan_result_cache --dev --pre --staging --staging-ref --production -
Close this rollout issue.
Rollback Steps
-
This feature can be disabled on production by running the following Chatops command in
#production:/chatops run feature set dependency_scanning_sbom_scan_result_cache false -
Disable the feature flag on non-production environments:
/chatops run feature set dependency_scanning_sbom_scan_result_cache false --dev --pre --staging --staging-ref -
Delete feature flag from all environments:
/chatops run feature delete dependency_scanning_sbom_scan_result_cache --dev --pre --staging --staging-ref --production
Note: When feature flag is disabled, the system falls back to the original code path (always upload and scan). Dual read support ensures no data loss during rollback.
Related Issues & MRs
Parent Issue: #562694 Parent Epic: &20326
Implementation MRs:
- !213584 (merged) - Database migrations and models
- !213586 (merged) - Service and API layer
- Pending: Observability events for caching
Analyzer MR: gitlab-org/security-products/analyzers/dependency-scanning!400 (merged)
Related Issues:
- #582203 - Remove dual code path for dependency scan results reads
- #582073 (closed) - Stop SBOM scan when advisory data has never been synced
- #577424 - SBOM Scan API performance tuning on self-managed
- #561759 (closed) - Implement soft rate limiting for SBOM Scan Processing
Rollout Timeline
| Stage | Milestone | Action | Duration |
|---|---|---|---|
| Stage 1 | 18.8 | Deploy dual read support | Completed |
| Stage 2a | 18.10 | Enable for gitlab-org | 24-48 hours |
| Stage 2b | 18.10 | Enable globally (incremental) | 2+ hours + 2 days observation |
| Stage 3 | 18.11 | Remove legacy code path | After 2+ days |