Dependency Scanning scans multiple Python projects in monorepos
Everyone can contribute. Help move this issue forward while earning points, leveling up and collecting rewards.
Release notes
TODO
Problem to solve
Currently Dependency Scanning (gemnasium-python) only supports finding/scanning a single python project and then exits early. In a monorepo, with several python projects, the (arguably expected) behavior would be to find/scan all projects.
Intended users
User experience goal
Proposal
1. Change the pip caching directory from ./dist to a centrally located one
When using pip download over multiple projects if project 1 downloads dependency x then we want project 2 to skip downloading dependency x. Therefore the pip caching directory here should be changed to a centrally located one (f.e. /tmp/pip_cache). Suggestions for the location welcome.
2. Introduce virtualenv to isolate project dependencies
Solves: #34763 (closed)
Currently all dependencies are installed globally. With new changes to pip dependency resolving (since 2020) pip would try to find a compatible state between project 1 and project 2's dependencies, this state may not exist and could take forever to calculate. Therefore we have to separate the projects by virtualenv and use pipdeptree in each virtualenv separately.
Added benefit would be that we can remove the exclusion string currently hardcoded in pipdeptree as we would be relying on pipdeptree inside the venv.
3. Change finding strategy to detect multiple projects
DS is locked to exit early on the first python project it finds. After caching/virtualenv is introduced we are safe to change the existing behavior in https://gitlab.com/gitlab-org/security-products/analyzers/gemnasium/-/blob/6af7457f90890787d1a3e1b361188f908b6ee23e/finder/cli.go#L154 to reflect this.
There has been some discussion in #241659 (comment 459602912) regarding backwards compatibility, and most probably we should introduce a DS_MULTIPLE_PROJECTS_ENABLED or DS_PYTHON_MULTIPLE_PROJECTS_ENABLED flag to signify this change in behavior.
(Aside; we can rework gitlab-org/security-products/analyzers/gemnasium!191 (closed) for this. I've set it to draft.)
Further details
Quotes from @alexandervaneck:
Through comments on gitlab-org/security-products/analyzers/gemnasium!191 (closed) it's become clear that it could be a worthwhile contribution to make multiple python projects work in DS. The project I'm working on is a large monorepo with 15+ projects inside which share many of the same dependencies (sometimes different versions of the same dependencies). Eager to make DS work for this project I hacked together this feature which I'm now looking to contribute back to the upstream.
Permissions and Security
No change
Documentation
The new behavior must be documented in https://docs.gitlab.com/ee/user/application_security/dependency_scanning/#python and https://docs.gitlab.com/ee/user/application_security/dependency_scanning/#supported-languages-and-package-managers.
Also, the new CI variable must be documented in https://docs.gitlab.com/ee/user/application_security/dependency_scanning/#configuring-dependency-scanning.
Availability & Testing
A new image spec for the gemnasium-python Docker image must be added to cover these scenarios:
- git repo contains sibling Python projects
- scan of multiple projects is ENABLED
- scan of multiple projects is DISABLED
- git repo contains nested Python projects
- scan of multiple projects is ENABLED
Job integration tests (using test projects via downstream pipelines) don't seem necessary since the definition of the gemnasium-python-dependency_scanning job doesn't change.
Available Tier
Ultimate/Gold
Implementation plan:
-
change python builders (gitlab-org/security-products/analyzers/gemnasium!200 (closed)) -
create a virtualenv helper -
remove pipdetree builder and install/invoke via virtualenv -
invoke virtualenv helper from pip builder and setuptools builder -
decide on caching directory: #332558
-
-
add DS_MULTIPLE_PROJECTS_ENABLEDenvironment variable (gitlab-org/security-products/analyzers/gemnasium!191 (closed))-
add configuration parameter to the flags -
if true, use SearchAll for SearchMode in PresetGemnasiumPython
-
- testing
-
update integration tests -
test with variable { true,false} on multi-project with sibling projects -
test with variable trueon multi-project with nested projects
-
-
-
update Dependency Scanning Documentation -
update python "only one project" behavior to document how the new environment variables affects scanning -
updated last column ("processes multiple files") of the Supported languages and package managers table -
document the DS_MULTIPLE_PROJECTS_ENABLEDenvironment variable in the configuring section
-
-
create release documentation -
update Release section of this issue -
release post
-
What does success look like, and how can we measure that?
Being able to scan python monorepos.
What is the type of buyer?
Is this a cross-stage feature?
No
What is the competitive advantage or differentiation for this feature?
Links / references
This page may contain information related to upcoming products, features and functionality. It is important to note that the information presented is for informational purposes only, so please do not rely on the information for purchasing or planning purposes. Just like with all projects, the items mentioned on the page are subject to change or delay, and the development, release, and timing of any products, features, or functionality remain at the sole discretion of GitLab Inc.