Implement soft rate limiting for SBOM Scan Processing

Why are we doing this work

To prevent abuse of the SBOM Scan API and maintain service quality for all users, we need to implement a soft rate limiting mechanism at the service layer. This will ensure that heavy usage by individual projects doesn't impact the shared processing capacity and cause delays or timeouts for other users.

The current implementation processes all SBOM scans with the same priority and queue. This could lead to resource contention when projects exceed reasonable usage thresholds or prevent heavy users to continue using the feature beyond such threshold.

Relevant links

Non-functional requirements

  • Performance: Maintain fast processing for normal usage while gracefully degrading for heavy users
  • Feature flag: n/a
  • Documentation: n/a
  • Testing: Comprehensive specs for rate limiting logic and queue routing

TODO: check potential impact on metrics and the possibility to flag throttled scans to distinguish them and prevent considering this an undesired performance degradation.

Proposed behavior

Normal usage (under threshold):

  • Scans processed on high-priority queue (sbom_scans)
  • Fast processing with :high urgency
  • Standard API responses

Heavy usage (over threshold):

  • New scans routed to throttled queue (sbom_scans_throttled)
  • Lower urgency (:low) with higher concurrency limits
  • API returns rate limit headers
  • Client displays warning about increased processing time

Implementation plan (WIP)

MR 1: Fixed soft rate limit

  • Implement soft rate limiting logic, hardcoded using Gitlab::ApplicationRateLimiter
  • Create a ProcessSbomScanThrottledWorker doing the same operation as ProcessSbomScanWorker but with a :low urgency and a higher concurrency (35)
  • Implement logic to route scans to throttled queue when threshold is exceeded
  • Add Retry-After rate limit header to the upload API response to inform the client when throttling occurs

MR 2: Analyzer Client

  • Update client to read the Retry-After rate limit header from upload response
  • Add logging/warning messages for throttled scans expliciting the impact and the Retry-After value

MR 3 (optional): Configurable thresholds

  • Implement configurable thresholds for Sbom Scan limits (e.g., 50 scans per 10 minutes)
  • Implement configurable concurrency limit sidekiq workers
See previous plan

MR 1: Rate Limiting Infrastructure

  • Implement soft rate limiting logic (maybe add to Gitlab::ApplicationRateLimiter)
  • (TBD) Implement configurable thresholds for Sbom Scan limits (e.g., 50 scans per 10 minutes)

MR 2: Queue Routing Logic

  • Create ProcessSbomScanThrottledWorker with lower urgency and higher concurrency
  • Implement logic to route scans to throttled queue when threshold exceeded

MR 3: API Response Enhancement

  • Add rate limit headers to API responses (X-RateLimit-Limit, X-RateLimit-Remaining, Retry-After)
  • Update API documentation with rate limiting behavior

MR 4: Analyzer Client

  • Update client to read rate limit headers and increase timeout
  • Add logging/warning messages for throttled scans

Configuration

We can possibly add admin settings to configure:

  • the rate limit threshold
  • the throttled queue concurrency limit
  • the rate limit window duration

This will give more flexibility to self-managed instance admin to adjust these settings based on their respective infrastructure. This could also be done as a follow up improvement.

Verification steps

  1. Configure rate limiting thresholds in admin settings if applicable, or manually adjust :dependency_scanning_sbom_scan_api_throttling threshold and interval values in the ApplicationRateLimiter.
  2. Create multiple SBOM scans for a single project within the time window
  3. Verify first N scans are processed normally on high-priority queue
  4. Verify subsequent scans are routed to throttled queue with appropriate headers
  5. Verify rate limit resets after the configured time window
  6. Test that other projects are unaffected by one project's heavy usage

Benefits

  • Maintains availability: No hard limits, heavy users can still use the feature
  • Preserves service quality: Prevents resource contention affecting other users
  • Transparent communication: Clients are informed about throttling via headers
  • Configurable: (optional) Admin can adjust thresholds based on infrastructure capacity
  • Gradual degradation: Performance degrades gracefully rather than failing

Edited by Olivier Gonzalez