Extract SBoM making jobs from Dependency Scanning, License Scanning

Summary

Right now the Dependency Scanning jobs (DS) and License Scanning (LS) job are responsible for both listing the project dependencies, and scanning to report vulnerabilities and licenses, respectively. This has a negative impact on:

complexity, b/c analyzer projects have too many responsabilities
maintenance, b/c analyzer projects needs to be updated to catch up with any of the package managers they support
flexibility, b/c analyzers projects can only support a limited of package manager versions
performance, b/c DS and LS repeat the detection step, and this step might consume a lot of resources
consistency, b/c the detection logic implemented in DS and LS are not aligned
contributions, b/c of the complexity

To address this, a possible solution is to extract jobs responsible for listing the dependencies, and feed LS and DS the output of these jobs. The "dependency listing" jobs generate SBoM documents, and pass them to LS and DS jobs as CI artifacts. We can see them as "SBoM making" jobs.

SBoM making jobs could generate CylconeDX docs users could directly download and use.

It seeems sufficient to have 1 LS job and 1 DS job, because the SBoM docs passed to the scanning jobs contain all the information needed to perform the scan. (LS would also take in the paths where the packages have been installed to look for LICENSE files. The installed packages can also be passed as artifacts.)

LS and DS jobs could merge the multiple SBoM inputs into a single SBoM output, and add vulnerabilities and license information to it, respectively.

SBoM making jobs must be executed prior to the test stage. One option is to use the build stage defined in Auto DevOps. See Build.gitlab-ci.yml. As a consequence, CI linting fails if the CI template for DS or LS is included but the project pipeline doesn't have a build stage. This is a breaking change.

Each SBoM making job corresponds to a small project that targets 1 specific language or 1 specific package manager.

A SBoM making project can be published as multiple Docker images to cover multiple versions of the tools. For instance, the SBoM maker for Python could be built on top of python:3.6-slim, python:3.9-slim, python:3.10-slim, and published as with the same image tags. This becomes possible because SBoM maker projects are simpler than analyzer projects, and only target one language (or package manager even).

Challenges

This is an important breaking change.
- build stage becomes required.
- Some existing CI jobs are removed, and users might have configured them in their CI config files.
SBoM making jobs might overlap. For instance, if there's a pip specific job and a setuptools specific job, then we don't want to trigger the latter if this is a pip project.
SBoM making projects will probably share code. For instance, they should all support DS_EXCLUDED_PATHS.
We end up with many projects, and many Docker images.

Improvements

SBoM generators are very flexible.
- Users can provide set up their own SBoM generator jobs.
- GitLab can provide SBoM generator images to cover many languages, build tools, version, etc.
DS and LS projects are only about scanning.
- They are smaller projects.
- They are released less often.
- They are easier to grasp, making contributions easier.
- They are easier to test.
Scanning jobs do less.
- They are easier to debug.
SBoMs can be cached, making possible to rescan the project w/o generating a new SBoM.
Job params can be made more specific.
- High timeout for jobs that install dependencies, low timeout for DS and LS jobs.

Risks

The SBoM passed to the scanning job might miss some information needed to automatically dismiss vulnerabilities as false-positives, or to better assess their severity.
The tools responsible for creating the SBoM might repeat the same code and replicate the same logic.
Having separate tools to generate SBoMs for different languages and package managers might result in discrepancies.

Involved components

Optional: Intended side effects

Optional: Missing test coverage

Edited Mar 15, 2022 by Fabien Catteau