Update Dependency Scanning analyzer ADD and add ADRs for vulnerability... (b3a98e91) · Commits · GitLab.com / Content Sites / handbook

content/handbook/engineering/architecture/design-documents/dependency_scanning_analyzer/_index.md

+75 −213

Original line number	Diff line number	Diff line
		---
		title: "Dependency graph export only dependency scanning analyzer"
		title: "Dependency Scanning Analyzer"
		status: ongoing
		creation-date: "2024-08-14"
		authors: [ "@hacks4oats" ]
		coaches: [ "@hacks4oats" ]
		dris: [ "@johncrowley", "@thiagocsf" ]
		owning-stage: "~devops::secure"
		authors: [ "@hacks4oats", "@gonzoyumo" ]
		coaches: [ ]
		dris: [ "@johncrowley", "@thiagocsf", "@nilieskou" ]
		owning-stage: "~devops::application security testing"
		participating-stages: []
		# Hides this page in the left sidebar. Recommended so we don't pollute it.
		toc_hide: true
		@@ -21,229 +21,91 @@ For long pages, consider creating a table of contents.

		## Summary

		The dependency scanning feature is powered by a set of analyzers - `gemnasium`,
		`gemnasium-maven`, and `gemnasium-python`. Associated with CI templates, these analyzers have the
		responsibility of detecting supported projects, building the dependency graph or
		list when needed, parsing the detected dependencies, and finally, producing a
		security report with detected vulnerabilities alongside a CycloneDX SBOM that
		contains the dependencies. This approach has worked well, but over time it's
		become evident that the actions required to build a project's dependency graph
		exports come with a lot of complexity. This complexity negatively impacts the
		maintenance and creation of features, and the user experience of setting up and
		maintaining the dependency scanning analyzer.
		The dependency scanning feature has been historically powered by a set of analyzers - `gemnasium`, `gemnasium-maven`, and `gemnasium-python`. Associated with CI templates, these analyzers have the responsibility of detecting supported projects, building the dependency graph or list when needed, parsing the detected dependencies, and finally, producing a security report with detected vulnerabilities alongside a CycloneDX SBOM that contains the dependencies. This approach has worked well, but over time it's become evident that the actions required to build a project's dependency graph exports come with a lot of complexity. This complexity negatively impacts the maintenance and creation of features, and the user experience of setting up and maintaining the dependency scanning analyzer.

		To address these challenges, we are redesigning the dependency scanning analyzer to follow a multi-tiered approach that balances accuracy with ease of use. This document outlines the overall vision and architecture of the new analyzer, while specific implementation decisions are documented in the [Architectural Decision Records (ADRs)](#decisions) section.

		## Motivation

		The high cost associated with building the dependency graphs/list exports
		motivates us to rethink how we can structure the dependency scanning feature.
		Instead of building the project dependency graphs or lists for customers, we
		can delegate this responsibility to a job that runs before the analyzer does.
		A build stage is a very common part of the development cycle, and generating the
		dependency artifacts during this stage is a lot simpler than mapping existing
		build system configuration values to the ones used by the gemnasium set of
		analyzers.

		In addition, build jobs can take a considerably large amount of time, so removing
		the build process from dependency scanning reduces user's CI minute usage, and
		further tightens the development feedback loop. Further yet, building a
		project twice presents the possibility that the analyzer may build something that
		does not match what is deployed. This mismatch can lead to false positives and
		negatives, both of which skew a project's security status signal.

		### Goals

		- Customers remove the need to set up a secondary build process. Historically,
		Python and Java have required a build process.
		- Reduced bug maintenance costs. A large amount of our issues surface from edge
		cases that are already handled by a customer in a previous build step, but were
		not accounted for by the analyzer's build implementation. These issues
		increase the code complexity, and cut into scheduled additions and improvements.
		- Offline support by default.
		- Reduced security maintenance costs. Building projects means that the analyzer
		images need to ship with pre-installed versions of supported build systems, for
		example Gradle and Maven, and runtimes like Java or Python.
		- Removal of historical limitations like single project analysis for Java and
		Python monorepos.

		### Non-Goals

		- Supporting 3rd party SBOM generators. We can still support this in a future
		iteration.
		The high cost associated with building the dependency graphs/list exports motivates us to rethink how we can structure the dependency scanning feature. Instead of building the project dependency graphs or lists on behalf of customers and within the analyzer, we can delegate this responsibility to a job that runs before the analyzer does. A build stage is a very common part of the development cycle, and generating the dependency artifacts during this stage is a lot simpler than mapping existing build system configuration values to the ones used by the gemnasium set of analyzers.

		The high maintenance cost associated with building the dependency graphs/list exports has pushed us to rethink how we can structure the dependency scanning feature. Instead of building the project dependency graphs or lists on behalf of customers and within the analyzer, we can delegate this responsibility to a job that runs before the analyzer does. A build stage is a very common part of the development cycle, and generating the dependency artifacts during this stage is a lot simpler than mapping existing build system configuration values to the ones used by the gemnasium set of analyzers. So we initially considered deferring this entirely to users (see [ADR 001: Graph Export Only](./decisions/001_graph_export_only.md)) but eventually faced customer feedback and other challenges that forced us to revisit this design.

		## Goals

		- Provide a simplified, maintainable analyzer that reduces the attack surface and maintenance burden
		- Support multiple dependency detection strategies to accommodate different project configurations
		- Enable out-of-the-box dependency scanning for projects with committed lockfiles or graphfiles
		- Support automatic dependency resolution for projects that require build steps
		- Provide a fallback mechanism for projects without pre-generated dependency artifacts
		- Reduce security maintenance costs by eliminating bundled runtimes and package managers from the analyzer image
		- Removal of historical limitations like single project analysis for Java and Python monorepos

		## Non-Goals

		- Supporting 3rd party SBOM generators. We can still support this in a future iteration.

		## Proposal

		Create a new analyzer that focus on supporting only
		[dependency graph exports](https://docs.gitlab.com/ee/user/application_security/terminology/#dependency-graph-export).
		Document how to generate the exports with example projects, and provide
		a dependency scanning CI/CD component that scans the generated artifacts.

		Because of the change to SBOM-based scanning in [epic 8026](https://gitlab.com/groups/gitlab-org/-/epics/8026),
		do not port over the vulnerability matching done by the Gemnasium analyzers,
		as this functionality is already [planned for deprecation](https://gitlab.com/groups/gitlab-org/-/epics/14146).
		The new analyzer should be based on a scratch image to reduce the attack surface introduced by container dependencies.

		### Pros

		- Simplified integration tests. No need to test against various permutations of
		package managers, runtime, and compiler versions.
		- We should always have zero container-scanning vulnerabilities. This translates
		to a reduced workload on the engineers going through reaction rotation.
		- Smaller image sizes. Fast CI job start-up, reduced network traffic.
		- Simplified FIPS-compliance as the library does not use crypto libraries.
		- Improved community contribution experience due to simplified permissions for
		development pipeline execution.

		### Cons

		- Additional documentation required with examples and guides on getting started
		with a dependency scanning for certain package managers.
		- Requires the establishment of new graph export naming standards.
		- Users need to configure their build jobs as instructions. It doesn't work out of the box.

		## Design and implementation details

		At a high level, the new dependency scanning feature will operate as follows.

		```mermaid
		sequenceDiagram
		autonumber
		actor user
		participant build job
		participant analyzer
		participant sbom ingestion service
		participant database

		user->>build job: triggers pipeline on default branch
		build job->>analyzer: passes all dependency graph exports generated as artifacts
		analyzer->>sbom ingestion service: converts dependency graph exports to CycloneDX SBOMs
		sbom ingestion service->>database: stores occurrences of sbom components
		```

		### Build job(s)

		It's important to note that we cannot expect for a dependency graph export to be
		checked into a project's repository. This is likely to happen in cases where
		the dependency graph export does not also function as a lock file like in the
		cases of `pipdeptree` and `pipenv graph` dependency graph exports. In such
		cases, we will expect the build job to generate the dependency graph exports,
		and for the job to store these as [job artifacts](https://docs.gitlab.com/ee/ci/jobs/job_artifacts.html).

		We'll use the following naming conventions to establish a contract with users on
		what file's we'll detect in cases where the dependency graph export does not
		function as a lock file, and thus does not have a canonical name.

		\| Pattern \| Description
		\| ------- \| -----------
		\| `**/go.graph` \| Generated via `go mod graph > go.graph`
		\| `**/pipenv.graph.json` \| Generated via `pipenv graph --tree-json > pipenv`

		It's required for the build job to run in a stage that precedes the one in which
		the dependency scanning analyzer runs. This is true by default, since the
		analyzer runs in the `test` stage which runs after the `build` stage.

		### Analyzer

		Once the build jobs complete, and the artifacts are stored, they will be passed
		on to [proceeding jobs](https://docs.gitlab.com/ee/ci/jobs/job_artifacts.html#prevent-a-job-from-fetching-artifacts)
		unless specifically asked not to do so. The analyzer takes advantage of this and
		expects that users have configured the build jobs to pass on the artifacts using
		the documented naming patterns. It will then search the entire target directory,
		by default this is the project's repository, detect all supported files, parse
		them, and convert them into a CycloneDX SBOM that can be utilized by the
		services running in the GitLab monolith.

		### Pros

		- No preinstalled compilers, runtimes or system dependencies
		required.
		- Small attack surface.
		- Runs offline by default.

		### Cons

		- Graph export documentation varies in quality. Some package managers like `npm`
		document each version of the lock file, while others like `pnpm` do not.
		- Java and Python projects require additional configuration since they do not
		capture graph information in their lock files by default.

		## Alternative Solutions

		### Require lock file, add graph information to it

		One alternative solution to dependency graph exports is to make every supported
		lock file a dependency graph export by default. In this scenario, we would work
		directly with package manager maintainers to enhance lock files with transitive
		dependency relationships, and dependency group metadata. For example, we could
		work with the Gradle maintainers to add a new version of their `gradle.lockfile`
		that includes parent dependencies. Our contributions would have the added
		benefit of improving the experience for our users by including the necesssary
		tooling out of the box, overall improving the workflow for getting started with
		GitLab's dependency scanning feature.

		#### Pros

		- Does not require establishing new file requirements.
		- Works out of the box in majority of cases. Package managers usually generate
		a lock file if one doesn't exist.

		#### Cons

		- Package managers tend to have large code bases that increase the onboarding
		time required.
		- Lock files require domain expertise. For example, in [pnpm's issue 7685](https://github.com/pnpm/pnpm/issues/7685)
		you can see the discussion of a very specific corner case that must be handled.
		- Project maintainers have their own sets of concerns that may not align with
		our own. For example, they may prioritize stability and maintenance over new
		features.
		- Older versions of package managers, or build tools, would not be compatible
		with new additions.

		### Rely on 3rd party CycloneDX generators
		### Design Principles

		- Separation of Concerns: Dependency detection (what components exist) is separated from vulnerability analysis (which components have vulnerabilities)
		- Minimal Image Footprint: The analyzer image contains only the scanning logic, not build tools or runtimes
		- Flexibility: Different projects can use different dependency detection strategies based on their needs

		### Dependency detection

		The new dependency scanning analyzer follows a multi-tiered approach to dependency detection, providing flexibility while maintaining accuracy.

		For more details on the dependency detection approach, including the service-based resolution pattern and manifest parsing implementation, see [ADR 003: Dependency Resolution and Manifest Scanning](./decisions/003_dependency_resolution_and_manifest_scanning.md).

		#### Tier 1: Lockfile/Graphfile Present (Highest Accuracy)

		This approach moves the direction of composition analysis so that we interface
		only with user provided `cyclonedx` CI reports from 3rd party CycloneDX generators.

		#### Pros

		- No CI/CD component integration testing.
		- No analyzer maintenance required.

		#### Cons

		- Tied to the GitLab release schedule, so we can't deploy new features,
		enhancements, and bug fixes mid milestone.
		- There are a lot of third party analyzers that can generate a CycloneDX report.
		Supporting all of their custom [metadata properties](https://cyclonedx.org/docs/1.5/json/#metadata_properties)
		and [component properties](https://cyclonedx.org/docs/1.5/json/#components_items_properties)
		can be challenging.
		- Requires time to get up to speed with third party SBOM generator code bases.
		- Proposals to the generators may be rejected. If required, we could fork the
		project, but that comes with its own set of challenges.
		- Dependency graphs may be incomplete like `cyclondex_py` in some circumstances.
		When projects have committed or pre-generated lockfiles or graphfiles, the analyzer consumes them directly. This provides the most accurate dependency information with minimal processing overhead.

		### Generate custom dependency graph exports with package manager plugins

		In the cases where package managers expose a public API, we are able to write a
		plugin to generate the dependency graph in a format of our choosing. This has
		been used for `gemansium-maven` dependency analysis.
		#### Tier 2: Automatic Dependency Resolution

		#### Pros
		For projects that require build steps to generate dependency artifacts, the analyzer supports automatic dependency resolution through preceding CI jobs that run in the `.pre` stage. These jobs:

		- Choice of output format.
		- Can re-use the bundled `gemnasium-maven` plugins.
		- Use ecosystem-native tools (Maven, Gradle, Python's `uv`) in vanilla public images
		- Run the Dependency Scanning analyzer as a CI service to provide the necessary detection logic and generate the instructions for dependency resolution
		- Execute these instructions to produce lockfiles or graphfiles and export them as artifacts for the DS analyzer CI job to consume

		#### Cons
		This approach avoids bundling multiple runtimes and package managers into the analyzer image, reducing maintenance burden and security surface area.

		- Not all package managers have support for third party plugins. For example,
		`pnpm` does not have documented plugin support.
		- Plugins that do not use Ruby or Go require new language expertise, which
		lead to a smaller pool of plugin maintainers, and higher review load per
		maintainer.
		- Additional overhead required for the maintenance, improvements, and
		deployments of plugin projects.
		#### Tier 3: Manifest Parsing Fallback (Lowest Accuracy)

		When neither lockfiles nor graphfiles are available, the analyzer can parse dependency manifests directly to extract minimal dependency information. This provides basic coverage for projects without pre-generated artifacts, though with lower accuracy and completeness than lockfiles since it cannot capture transitive dependencies and the actual version used.

		### Vulnerability Scanning

		The analyzer integrates vulnerability scanning directly into the CI pipeline, providing immediate security feedback to developers. After generating CycloneDX SBOMs from detected dependencies, the analyzer:

		1. Uploads SBOMs to the GitLab SBOM Scan API: The generated SBOM files are sent to GitLab's backend vulnerability scanning service
		2. Polls for scan results: The analyzer waits for the backend to complete vulnerability analysis using the unified GitLab SBOM Vulnerability Scanner
		3. Aggregates findings: Results from multiple SBOMs are combined into a single security report
		4. Generates security report: A standardized GitLab dependency scanning report is produced with detected vulnerabilities

		This approach maintains separation of concerns by delegating the actual vulnerability detection logic to the unified Dependency Scanning engine using the [GitLab SBOM Vulnerability Scanner](../dependency_scanning_engine/decisions/001_gitlab_sbom_vulnerability_scanner.md), while the analyzer handles orchestration and result aggregation.

		For more details on the vulnerability scanning implementation, including error handling strategies, retry logic, and the concurrent processing model, see [ADR 002: Vulnerability Scanning using SBOM Scan API](./decisions/002_vulnerability_scanning.md).

		## Decisions

		- [ADR 001: Graph Export Only](./decisions/001_graph_export_only.md) - Documents the initial vision of supporting only lockfiles and graphfiles
		- [ADR 002: Vulnerability Scanning using SBOM Scan API](./decisions/002_vulnerability_scanning.md) - Documents the decision to reintroduce vulnerability scanning capabilities within the DS analyzer
		- [ADR 003: Dependency Resolution and Manifest Scanning](./decisions/003_dependency_resolution_and_manifest_scanning.md) - Documents the approach with automatic dependency resolution and manifest parsing fallback

		## Appendix

		- [dependency graph export](https://docs.gitlab.com/ee/user/application_security/terminology/#dependency-graph-export)
		- [package manager](https://docs.gitlab.com/ee/user/application_security/terminology/#package-managers)
		- [lock file](https://docs.gitlab.com/ee/user/application_security/terminology/#lock-file)

		## References

		- [Bring security scan results back into the Dependency Scanning CI job Epic](https://gitlab.com/groups/gitlab-org/-/work_items/17150)
		- [Dependency Resolution Epic](https://gitlab.com/groups/gitlab-org/-/work_items/20461)
		- [Manifest scanning Epic](https://gitlab.com/groups/gitlab-org/-/work_items/20457)
		- [Dependency Scanning Engine](../dependency_scanning_engine/_index.md)
		- [Dependency Scanning Engine ADR003: SBOM-based CI Pipeline Scanning](../dependency_scanning_engine/decisions/003_sbom_based_scans_for_ci_pipelines.md)

content/handbook/engineering/architecture/design-documents/dependency_scanning_analyzer/decisions/001_graph_export_only.md

0 → 100644

+134 −0

File added.

Preview size limit exceeded, changes collapsed.

content/handbook/engineering/architecture/design-documents/dependency_scanning_analyzer/decisions/002_vulnerability_scanning.md

0 → 100644

+222 −0

File added.

Preview size limit exceeded, changes collapsed.

content/handbook/engineering/architecture/design-documents/dependency_scanning_analyzer/decisions/003_dependency_resolution_and_manifest_scanning.md

0 → 100644

+314 −0

File added.

Preview size limit exceeded, changes collapsed.

content/handbook/engineering/architecture/design-documents/dependency_scanning_engine/_index.md

+1 −1

Original line number	Diff line number	Diff line
		@@ -5,7 +5,7 @@ creation-date: "2025-08-26"
		authors: [ "@gonzoyumo" ]
		coaches: [ "@mbenayoun", "@theoretick" ]
		dris: [ "@nilieskou" ]
		owning-stage: "~devops::secure"
		owning-stage: "~devops::application security testing"
		participating-stages: []
		toc_hide: true
		---