Skip to content

SBOM scan result files are not being cleaned up from object storage

Summary

The SbomScanUploader uses Time.current in its path generation, causing result files to become inaccessible and undeletable after the date changes. This prevents the cleanup worker from deleting expired files, leading to significant storage waste.

Current Behavior

  1. When an SBOM scan result file is uploaded on date X, it's stored at path: sbom_scans/.../YYYY_MM_DD/.../scan_id/sbom_scan_result.json (where YYYY_MM_DD = date X)
  2. When the file is accessed later (e.g., for deletion), the uploader regenerates the path using Time.current, resulting in a different date
  3. The cleanup worker attempts to delete the file at the wrong path, "succeeds" without error, but leaves the actual file orphaned in storage
  4. Database records are deleted, but files remain in object storage indefinitely

Expected Behavior

The uploader should use the scan's created_at timestamp consistently for path generation, ensuring files can be accessed and deleted at the correct path regardless of when the operation occurs.

Impact

Production data (GitLab.com):

  • Total storage: 33.87 GB
  • Orphaned files: 18.42 GB (54.4% wasted storage)
  • 1.3 million orphaned result files dating back to September 9, 2025
  • All SBOM files are cleaned up correctly (0 orphaned)
  • Only result files are affected

Root Cause

In ee/app/uploaders/security/vulnerability_scanning/sbom_scan_uploader.rb:

def self.hashed_path(project_id, build_id, model_identifier)
  Gitlab::HashedPath.new(
    Time.current.utc.strftime('%Y_%m_%d'),  # ❌ Uses current time, not file creation time
    build_id,
    model_identifier,
    root_hash: project_id
  ).to_s
end

private

def dynamic_segment
  raise ObjectNotReadyError, "SbomScan model not ready" unless model.id

  File.join(STORE_PATH_PREFIX, self.class.hashed_path(model.project_id, model.build_id, model.id))
end

Proposed Solution

Use the model's created_at timestamp for path generation:

def self.hashed_path(project_id, build_id, model_identifier, created_at: nil)
  timestamp = created_at || Time.current.utc
  Gitlab::HashedPath.new(
    timestamp.strftime('%Y_%m_%d'),
    build_id,
    model_identifier,
    root_hash: project_id
  ).to_s
end

private

def dynamic_segment
  raise ObjectNotReadyError, "SbomScan model not ready" unless model.id

  File.join(
    STORE_PATH_PREFIX, 
    self.class.hashed_path(model.project_id, model.build_id, model.id, created_at: model.created_at)
  )
end

Steps to Reproduce

  1. Create an SBOM scan with result file on day X
  2. Wait until day X+1
  3. Attempt to access the result file via the uploader (e.g., scan.result_file.file.exists?)
  4. Observe that the file is reported as not existing (wrong path is checked)
  5. Attempt cleanup via scan.delete_files_from_storage
  6. Observe that the method returns true but the file remains in storage

Additional Context

  • This issue only affects result files, not SBOM files (which use direct upload with a hash-based path)
  • The cleanup worker DestroyExpiredSbomScansWorker runs daily but cannot delete result files due to this bug
  • Files are meant to be ephemeral (2-day retention) but are accumulating indefinitely
  • The bug was introduced when the uploader was created and has been present since September 2025

Related Files

  • ee/app/uploaders/security/vulnerability_scanning/sbom_scan_uploader.rb
  • ee/app/models/security/vulnerability_scanning/sbom_scan.rb
  • ee/app/workers/security/vulnerability_scanning/destroy_expired_sbom_scans_worker.rb
  • ee/app/services/security/vulnerability_scanning/destroy_sbom_scans_service.rb
Edited by 🤖 GitLab Bot 🤖