PoC for on demand Dependency Scanning using SBOM in the GitLab rails application
Topic to Evaluate
- Design a new service to execute an on demand Dependency Scanning analysis without tying its results to any state in the vulnerability management system. This must reuse the GitLab SBOM Vulnerability Scanner, so that we maintain a single implementation of our scanner logic.
- Design an API that can be called from a running CI job, to send a list of dependencies to be scanned for vulnerabilities and get these results.
Outcomes
Summary
The On-Demand Dependency Scanning service provides a API for scanning SBOMs for vulnerabilities outside of the regular GitLab vulnerability management workflow. It reuses the existing GitLab SBOM Vulnerability Scanner implementation while offering flexibility for CI workflows. The solution uses an asynchronous architecture for scalability with temporary storage of scan artifacts. Indeed, it would not be reasonable to try processing SBOM documents and scan them synchronously. At least not with the existing infrastructure and the risk of impacting other parts of the product using the same resources.
GitLab already offers a framework to handle uploads and file storage, with a set of recommandations like the use of direct upload for scalability reasons: https://docs.gitlab.com/development/uploads/working_with_uploads/#recommendations
Fortunately, this fits well with our needs and we can do the processing of the uploaded SBOM in a sidekiq worker.
Solution architecture
- Asynchronous Processing: Leverages GitLab's existing upload framework and Sidekiq workers
-
Data Storage: Ephemeral
SbomScanActiveRecord model with:- Two attachments (SBOM document file and scan results file)
- State machine tracking scan progression (created → running → finished/failed)
- API Design: REST endpoints for submission and result retrieval
- Authentication: CI_JOB_TOKEN-based for security and rate limiting
Note that in addition to our own SbomScan model, the upload framework uses a dedicated AR model and creates a record for each stored file. This table is currently in the process of being partitionned and split to adapt to Cells (see #398199).
End-to-End Workflow
- From a running CI job, the client calls the API with a POST request (using CI_JOB_TOKEN to authenticate) and uploads a single SBOM document (cycloneDX) as an attachment. If there are multiple SBOMs, multiple calls must be made.
- The rails API receives the SBOM file, stores it as a file attachment for the
SbomScanmodel, and schedules a sidekiq job to execute the DS scan. It also schedules a deletion of the record in X days (retention TBD). Then it replies to the client with the id of the createdSbomScanrecord and the url to download the scan results (when they will be available). - When the worker executes the sidekiq job, the existing logic is used to parse the SBOM document and create an SBOM report (in memory ruby class). Then it scans the components with the GitLab SBOM Vulnerability Scanner, and generate the necessary findings (
Security::Findingruby class). Finally, this list of findings is stored as a json array in the second file attachement of theSbomScanmodel and sets its status tofinished. - In the meantime, the client polls the API to verify if the scan results are available by sending a GET request to te provided download url (poll delay TBD). If the state machine's status is
finishedorfailedthen the API returns the content stored for the scan results or errors. Otherwise the client keeps polling (backoff and timeout TBD).
Other considerations
How to prevent abuse on this API?
Ideally we only want this to be used by our built-in feature (at least for now) so this must only be called by a CI job. We can authenticate with the CI_JOB_TOKEN, like we do for CI job artifacts. This allows to assign a scan to a given CI job and from there we can define scoped application limits. E.g. "max x sbom scans per CI job". We can also consider limits per pipeline, project, group, account, etc.
Consider Geo replication for uploaded data
Considering the generated documents are only necessary for the purpose of the on-demand analysis, which is stateless, we probably don't need to support Geo.
Consider table partitioning for the DB model that stores these SBOM scans
Unless we intend on keeping these records, it might not be necessary to partition them. They will be cell-local data and likely removed after a few days (TBD).
Consider re-using the existing CI Job artifacts
It may look like we're doing the same upload work twice. Indeed, at the end of the Dependency Scanning CI job execution, the generated SBOM document will be stored as report artifact for that Job. Similarly, the generated findings will eventually end up in a security report that will also be uploaded as a CI job artifact.
Since artifact upload is idempotent when the file is the same, we could upload the SBOM artifact earlier during the job execution to do that DS scan, and then the regular upload made by the runner at the end of the CI job execution will simply be skipped (at least the processing done after that presence check).
Though, this might look like early optimization and ties this process into the already complex CI job artifact management codebase. This also doesn't work nicely for the Security Report artifact as
- we haven't yet decided if this API will format result as a DS security report
- even if we do, if multiple SBOM must be scanned in a single CI job, we have to merge the results into a single Security Report (we can't have multiple security report artifacts of the same type in the same CI job).
Thus, it looks more straightforward to go with a separate implementation at the moment and accept the overhead in processing and temporary storage.
Next Steps
- Define specific retention periods for the SbomScan records and the temporary attachments
- Determine polling backoff strategy and timeout parameters
- Define API response payload format (DS report vs raw findings)
- Define application limits
- Refine other implementation details