Dynamic Dependency Scanning jobs

Problem to solve

Today Dependency Scanning (including CycloneDX SBOM generation) is implemented as a fixed set of CI jobs that rely on predefined Docker images. This approach has important limitations that blocks popular feature enhancements.

  • It is not possible to detect and scan multiple Java projects or Python projects of a monorepo. This is because a job can only scan one of these projects. Epic: Allow all Java and Python files to be scanned (&12315 - closed)
  • Scanning jobs can't dynamically switch to the Docker image that's most compatible with the repository, based on the build dependencies. For instance, it can't switch to python:3.11 (or any image based on it) after detecting that the project relies on Python 3.11. TODO: link to relevant issue(s)
  • Users can't override execution rules without breaking the default behaviors. Issue: Improve extensibility of SAST, Dependency Scann... (#218444)
    1. Job is triggered if and only if compatible files are detected.
    2. Job switches to the FIPS image based on predefined CI variables.
  • In particular, users can't easily change job rules so that scanning jobs are only triggered when dependency files (detected or manually set) change.

(This applies to CycloneDX SBOM generation as well.)

Reminder: By design it's not possible to alter a CI pipeline and add new jobs to it after it's been created.

Challenges

  • The solution must be compatible with Scan Execution Policies.
  • It must be backward compatible, and with a reasonable migration path.
  • It should be consistent across all the product categories of Secure, and possibly beyond.
  • Users should be able to switch manual, and to force the following:
    • files to be scanned
    • Docker image used for the scan
    • command that runs the scan
    • execution rules

Proposals

The proposals fall into the following categories:

  1. Rely on existing tools: dynamic child pipelines.
    • Pros: No change to the CI/CD YAML syntax or to the backend.
    • Cons: It doesn't seem customizable.
  2. Add keywords to the CI/CD YAML syntax.
    • Pros: Customizable, user friendly, possibly backward compatible, ships w/ GitLab itself.
    • Cons: Significant backend change.
  3. Run scans out of a CI pipeline.
    • Pros: Very flexible.
    • Cons: Large change.

Proposal A

Extend parallel:matrix keyword of the CI/CD YAML syntax to create a matrix of Dependency Scanning jobs based on what the backend has detected.

Pros

  • customizable

Cons

  • It's assumed that the backend can detect everything w/o running a CI job.

Proposal B

Add a new dependency-scanning keyword to the CI/CD YAML syntax. This represents the Dependency Scanning jobs, and is expanded to multiple jobs based on what the backend has detected.

Pros

  • customizable

Cons

  • Compared to Proposal A, it's a bigger syntax change.
  • It's assumed that the backend can detect everything w/o running a CI job.

Proposal C

Replace CI templates with CI config generators. Generators would be included just like templates, but their contents would be generated by the backend.

Pros

  • Compare to A & B, it doesn't a keywords specific to Dependency Scanning to the YAML syntax.
  • Overall this is very generic.

Cons

  • Compare to A & B, it's possibly a much larger change.
  • Jobs can be customized as long as users can predict how jobs are named automatically. It relies on conventions and is less explicit than A & B.

Proposal D

Extend Scan Execution Policies' processor. SEP would delegate to a CI config generator specific to Dependency Scanning.

The DS CI config generator would be similar to the one proposed in proposal C, but we wouldn't have to extend the CI/CD YAML syntax to allow users to include it; this could be implemented later on.

Pros

  • We reuse code, and it's a much smaller change than Proposal E (running scans out of a pipeline).

Cons

Proposal E

Introduce a detection job that generates a CI config, and trigger a dynamic child pipeline.

This depends on #421564 (closed).

Pros

  • Fits in the CI.

Cons

  • not customizable
  • not backward compatible
  • not the best visualization

Proposal F

Run Dependency Scanning out of a pipeline, possibly using the CI infrastructure.

Pros

  • It removes many technical limitations.

Cons

  • There's a lot to design and implement. We essentially start from scratch.
  • UI needs to be defined.
  • not compatible with Scan Execution Policies
  • not customizable
  • not backward compatible

Proposal G

The proposal is twofold:

It's then possible to have the following jobs using the parallel:matrix syntax:

  • a gemnasium-python-dependency_scanning job per item of PYTHON_DEPENDENCY_FILES, and using an image named after PYTHON_VERSION.
  • a gemnasium-maven-dependency_scanning job per item of JAVA_DEPENDENCY_FILES, and using an image named after JAVA_VERSION.
  • a gemnasium-dependency_scanning job for all other dependency files; we would use variable expansion like $GO_DEPENDENCY_FILES,$RUBY_DEPENDENCY_FILES,...

Pros

  • no changes to the CI/CD YAML syntax

Cons

  • It's highly couple to the three existing scanning jobs.
  • Images have to be named after JAVA_VERSION and PYTHON_VERSION.
  • It doesn't scale in complexity. For instance, let's imagine that we want to select the image based on the language version AND the package manager version. Then we would have to maintain a matrix of images, instead of implementing a simple switch in the backend.

Proposal G2

Similar to Proposal G, but introduce predefined CI variables that contain the image name to be used by each analyzer:

  • GEMNASIUM_IMAGE
  • GEMNASIUM_MAVEN_IMAGE
  • GEMNASIUM_PYTHON_IMAGE

Pros

  • Compared to G, this is more flexible.

Cons

  • Compared to G, the new predefined CI variables are coupled to the analyzers.
Edited by Fabien Catteau