SBOM scan result files are not being cleaned up from object storage
Summary
The SbomScanUploader uses Time.current in its path generation, causing result files to become inaccessible and undeletable after the date changes. This prevents the cleanup worker from deleting expired files, leading to significant storage waste.
Current Behavior
- When an SBOM scan result file is uploaded on date X, it's stored at path:
sbom_scans/.../YYYY_MM_DD/.../scan_id/sbom_scan_result.json(where YYYY_MM_DD = date X) - When the file is accessed later (e.g., for deletion), the uploader regenerates the path using
Time.current, resulting in a different date - The cleanup worker attempts to delete the file at the wrong path, "succeeds" without error, but leaves the actual file orphaned in storage
- Database records are deleted, but files remain in object storage indefinitely
Expected Behavior
The uploader should use the scan's created_at timestamp consistently for path generation, ensuring files can be accessed and deleted at the correct path regardless of when the operation occurs.
Impact
Production data (GitLab.com):
- Total storage: 33.87 GB
- Orphaned files: 18.42 GB (54.4% wasted storage)
- 1.3 million orphaned result files dating back to September 9, 2025
- All SBOM files are cleaned up correctly (0 orphaned)
- Only result files are affected
Root Cause
In ee/app/uploaders/security/vulnerability_scanning/sbom_scan_uploader.rb:
def self.hashed_path(project_id, build_id, model_identifier)
Gitlab::HashedPath.new(
Time.current.utc.strftime('%Y_%m_%d'), # ❌ Uses current time, not file creation time
build_id,
model_identifier,
root_hash: project_id
).to_s
end
private
def dynamic_segment
raise ObjectNotReadyError, "SbomScan model not ready" unless model.id
File.join(STORE_PATH_PREFIX, self.class.hashed_path(model.project_id, model.build_id, model.id))
end
Proposed Solution
Use the model's created_at timestamp for path generation:
def self.hashed_path(project_id, build_id, model_identifier, created_at: nil)
timestamp = created_at || Time.current.utc
Gitlab::HashedPath.new(
timestamp.strftime('%Y_%m_%d'),
build_id,
model_identifier,
root_hash: project_id
).to_s
end
private
def dynamic_segment
raise ObjectNotReadyError, "SbomScan model not ready" unless model.id
File.join(
STORE_PATH_PREFIX,
self.class.hashed_path(model.project_id, model.build_id, model.id, created_at: model.created_at)
)
end
Steps to Reproduce
- Create an SBOM scan with result file on day X
- Wait until day X+1
- Attempt to access the result file via the uploader (e.g.,
scan.result_file.file.exists?) - Observe that the file is reported as not existing (wrong path is checked)
- Attempt cleanup via
scan.delete_files_from_storage - Observe that the method returns
truebut the file remains in storage
Additional Context
- This issue only affects result files, not SBOM files (which use direct upload with a hash-based path)
- The cleanup worker
DestroyExpiredSbomScansWorkerruns daily but cannot delete result files due to this bug - Files are meant to be ephemeral (2-day retention) but are accumulating indefinitely
- The bug was introduced when the uploader was created and has been present since September 2025
Related Files
ee/app/uploaders/security/vulnerability_scanning/sbom_scan_uploader.rbee/app/models/security/vulnerability_scanning/sbom_scan.rbee/app/workers/security/vulnerability_scanning/destroy_expired_sbom_scans_worker.rbee/app/services/security/vulnerability_scanning/destroy_sbom_scans_service.rb
Edited by 🤖 GitLab Bot 🤖