Dependency Scanning scans multiple Python projects in monorepos

Everyone can contribute. Help move this issue forward while earning points, leveling up and collecting rewards.

Release notes

TODO

Problem to solve

Currently Dependency Scanning (gemnasium-python) only supports finding/scanning a single python project and then exits early. In a monorepo, with several python projects, the (arguably expected) behavior would be to find/scan all projects.

Intended users

User experience goal

Proposal

1. Change the pip caching directory from ./dist to a centrally located one

When using pip download over multiple projects if project 1 downloads dependency x then we want project 2 to skip downloading dependency x. Therefore the pip caching directory here should be changed to a centrally located one (f.e. /tmp/pip_cache). Suggestions for the location welcome.

2. Introduce virtualenv to isolate project dependencies

Solves: #34763 (closed)

Currently all dependencies are installed globally. With new changes to pip dependency resolving (since 2020) pip would try to find a compatible state between project 1 and project 2's dependencies, this state may not exist and could take forever to calculate. Therefore we have to separate the projects by virtualenv and use pipdeptree in each virtualenv separately.

Added benefit would be that we can remove the exclusion string currently hardcoded in pipdeptree as we would be relying on pipdeptree inside the venv.

3. Change finding strategy to detect multiple projects

DS is locked to exit early on the first python project it finds. After caching/virtualenv is introduced we are safe to change the existing behavior in https://gitlab.com/gitlab-org/security-products/analyzers/gemnasium/-/blob/6af7457f90890787d1a3e1b361188f908b6ee23e/finder/cli.go#L154 to reflect this.

There has been some discussion in #241659 (comment 459602912) regarding backwards compatibility, and most probably we should introduce a DS_MULTIPLE_PROJECTS_ENABLED or DS_PYTHON_MULTIPLE_PROJECTS_ENABLED flag to signify this change in behavior.

(Aside; we can rework gitlab-org/security-products/analyzers/gemnasium!191 (closed) for this. I've set it to draft.)

Further details

Quotes from @alexandervaneck:

Through comments on gitlab-org/security-products/analyzers/gemnasium!191 (closed) it's become clear that it could be a worthwhile contribution to make multiple python projects work in DS. The project I'm working on is a large monorepo with 15+ projects inside which share many of the same dependencies (sometimes different versions of the same dependencies). Eager to make DS work for this project I hacked together this feature which I'm now looking to contribute back to the upstream.

Permissions and Security

No change

Documentation

The new behavior must be documented in https://docs.gitlab.com/ee/user/application_security/dependency_scanning/#python and https://docs.gitlab.com/ee/user/application_security/dependency_scanning/#supported-languages-and-package-managers.

Also, the new CI variable must be documented in https://docs.gitlab.com/ee/user/application_security/dependency_scanning/#configuring-dependency-scanning.

Availability & Testing

A new image spec for the gemnasium-python Docker image must be added to cover these scenarios:

  • git repo contains sibling Python projects
    • scan of multiple projects is ENABLED
    • scan of multiple projects is DISABLED
  • git repo contains nested Python projects
    • scan of multiple projects is ENABLED

Job integration tests (using test projects via downstream pipelines) don't seem necessary since the definition of the gemnasium-python-dependency_scanning job doesn't change.

Available Tier

Ultimate/Gold

Implementation plan:

What does success look like, and how can we measure that?

Being able to scan python monorepos.

What is the type of buyer?

Is this a cross-stage feature?

No

What is the competitive advantage or differentiation for this feature?

Links / references

This page may contain information related to upcoming products, features and functionality. It is important to note that the information presented is for informational purposes only, so please do not rely on the information for purchasing or planning purposes. Just like with all projects, the items mentioned on the page are subject to change or delay, and the development, release, and timing of any products, features, or functionality remain at the sole discretion of GitLab Inc.

Edited by 🤖 GitLab Bot 🤖