Dynamic Dependency Scanning jobs
Problem to solve
Today Dependency Scanning (including CycloneDX SBOM generation) is implemented as a fixed set of CI jobs that rely on predefined Docker images. This approach has important limitations that blocks popular feature enhancements.
- It is not possible to detect and scan multiple Java projects or Python projects of a monorepo. This is because a job can only scan one of these projects. Epic: Allow all Java and Python files to be scanned (&12315 - closed)
- Scanning jobs can't dynamically switch to the Docker image that's most compatible with the repository, based on the build dependencies. For instance, it can't switch to
python:3.11
(or any image based on it) after detecting that the project relies on Python 3.11. TODO: link to relevant issue(s) - Users can't override execution rules without breaking the default behaviors. Issue: Improve extensibility of SAST, Dependency Scann... (#218444)
- Job is triggered if and only if compatible files are detected.
- Job switches to the FIPS image based on predefined CI variables.
- In particular, users can't easily change job rules so that scanning jobs are only triggered when dependency files (detected or manually set) change.
(This applies to CycloneDX SBOM generation as well.)
Reminder: By design it's not possible to alter a CI pipeline and add new jobs to it after it's been created.
Challenges
- The solution must be compatible with Scan Execution Policies.
- It must be backward compatible, and with a reasonable migration path.
- It should be consistent across all the product categories of Secure, and possibly beyond.
- Users should be able to switch manual, and to force the following:
- files to be scanned
- Docker image used for the scan
- command that runs the scan
- execution rules
Proposals
The proposals fall into the following categories:
- Rely on existing tools: dynamic child pipelines.
- Pros: No change to the CI/CD YAML syntax or to the backend.
- Cons: It doesn't seem customizable.
- Add keywords to the CI/CD YAML syntax.
- Pros: Customizable, user friendly, possibly backward compatible, ships w/ GitLab itself.
- Cons: Significant backend change.
- Run scans out of a CI pipeline.
- Pros: Very flexible.
- Cons: Large change.
Proposal A
Extend parallel:matrix
keyword of the
CI/CD YAML syntax
to create a matrix of Dependency Scanning jobs
based on what the backend has detected.
Pros
- customizable
Cons
- It's assumed that the backend can detect everything w/o running a CI job.
Proposal B
Add a new dependency-scanning
keyword to the
CI/CD YAML syntax.
This represents the Dependency Scanning jobs,
and is expanded to multiple jobs based on what the backend has detected.
Pros
- customizable
Cons
- Compared to Proposal A, it's a bigger syntax change.
- It's assumed that the backend can detect everything w/o running a CI job.
Proposal C
Replace CI templates with CI config generators. Generators would be included just like templates, but their contents would be generated by the backend.
Pros
- Compare to A & B, it doesn't a keywords specific to Dependency Scanning to the YAML syntax.
- Overall this is very generic.
Cons
- Compare to A & B, it's possibly a much larger change.
- Jobs can be customized as long as users can predict how jobs are named automatically. It relies on conventions and is less explicit than A & B.
Proposal D
Extend Scan Execution Policies' processor. SEP would delegate to a CI config generator specific to Dependency Scanning.
The DS CI config generator would be similar to the one proposed in proposal C,
but we wouldn't have to extend the CI/CD YAML syntax to allow users to include
it;
this could be implemented later on.
Pros
- We reuse code, and it's a much smaller change than Proposal E (running scans out of a pipeline).
Cons
- Users must enable Scan Execution Policies. Right now this involves creating a project to keep the policies, so it's not a lightweight process.
- We radically change the scope of the SecurityOrchestrationPolicies::Processor. It would support features owned by groupsecurity policies and by groupcomposition analysis. This might have a negative impact on velocity.
Proposal E
Introduce a detection job that generates a CI config, and trigger a dynamic child pipeline.
This depends on #421564.
Pros
- Fits in the CI.
Cons
- not customizable
- not backward compatible
- not the best visualization
Proposal F
Run Dependency Scanning out of a pipeline, possibly using the CI infrastructure.
Pros
- It removes many technical limitations.
Cons
- There's a lot to design and implement. We essentially start from scratch.
- UI needs to be defined.
- not compatible with Scan Execution Policies
- not customizable
- not backward compatible
Proposal G
The proposal is twofold:
- Introduce new predefined CI/CD variables.
-
*_DEPENDENCY_FILES
:JAVA_DEPENDENCY_FILES
,PYTHON_DEPENDENCY_FILES
, etc. -
*_VERSION
:JAVA_VERSION
,PYTHON_VERSION
, etc.
-
- Implement Support variable expansion in parallel matrix j... (#381603) or Parallel CI jobs from file globs (#356273).
It's then possible to have the following jobs using the parallel:matrix syntax:
- a
gemnasium-python-dependency_scanning
job per item ofPYTHON_DEPENDENCY_FILES
, and using an image named afterPYTHON_VERSION
. - a
gemnasium-maven-dependency_scanning
job per item ofJAVA_DEPENDENCY_FILES
, and using an image named afterJAVA_VERSION
. - a
gemnasium-dependency_scanning
job for all other dependency files; we would use variable expansion like$GO_DEPENDENCY_FILES,$RUBY_DEPENDENCY_FILES,...
Pros
- no changes to the CI/CD YAML syntax
Cons
- It's highly couple to the three existing scanning jobs.
- Images have to be named after
JAVA_VERSION
andPYTHON_VERSION
. - It doesn't scale in complexity. For instance, let's imagine that we want to select the image based on the language version AND the package manager version. Then we would have to maintain a matrix of images, instead of implementing a simple switch in the backend.
Proposal G2
Similar to Proposal G, but introduce predefined CI variables that contain the image name to be used by each analyzer:
- GEMNASIUM_IMAGE
- GEMNASIUM_MAVEN_IMAGE
- GEMNASIUM_PYTHON_IMAGE
Pros
- Compared to G, this is more flexible.
Cons
- Compared to G, the new predefined CI variables are coupled to the analyzers.