Python projects have extra dependencies

Summary

In some cases, the dependency list of a Python project contains extra dependencies that do not belong to the project, but belongs to gemnasium-python itself (Dependency Scanning for Python). This happens whenever the scan involves installing the project dependencies, and listing them using pipdeptree. In that case, it's as if the project depends on pipdeptree, even if it's not the case.

This impacts Python projects using pip or setuptools. Pipenv projects are not impacted by this.

Further details

This problem was surfaced when connecting gemnasium-python to the gemnasium-db, as described in #14630 (closed). The integration required to add some dependencies to the gemnasium-python project, and these were artifically added to the dependencies of the scanned projects. See discussion.

Steps to reproduce

  • create a Python project with a requirements.txt or a Pipfile but no lock file
  • enable Dependency Scanning for this project
  • inspect the dependency list

Example Project

See https://gitlab.com/gitlab-org/security-products/tests/python-pip/blob/8ea559e2b16acc49a49d8fe3bcb48cd6226c729a/qa/expect/gl-dependency-scanning-report.json#L526

This was surfaced when working on gemnasium-python!29, see failing pipeline.

Proposal

Use --user install option for pip and setuptools when building projects. And the -u option when running analysis via pipdeptree.

This has been successfully tested with both pip and setuptools using the following projects:

# python setup.py install --user
# pip install pipdeptree
# pipdeptree -u
test-project==0.0.1
  - Django [required: ==1.11.3, installed: 1.11.3]
    - pytz [required: Any, installed: 2021.3]
  - docutils [required: ==0.13.1, installed: 0.13.1]
  - requests [required: ==2.5.3, installed: 2.5.3]

Please note that the project itself is listed, as this is already the case. It's not a regression.

# pip install --user -r requirements.txt
# pip install pipdeptree
# pipdeptree -u
beautifulsoup4==4.6.0
django-contrib-comments==1.8.0
  - Django [required: >=1.8, installed: 1.11.4]
    - pytz [required: Any, installed: 2018.3]
django-mptt==0.9.0
  - django-js-asset [required: Any, installed: 1.0.0]
django-tagging==0.4.6
django-xmlrpc==0.1.8
mots-vides==2015.5.11
pyparsing==2.2.0
regex==2018.2.8

NOTE: Project requirements that match pre-installed packages won't be listed. For instance, currently pip isn't listed if requirements.txt contains pip==21.2.4, because it matches the pre-installed version; nothing gets installed in the user space.

See proof
root@489051b35666:/# pip install pipdeptree
Collecting pipdeptree
  Downloading pipdeptree-2.2.1-py3-none-any.whl (21 kB)
Requirement already satisfied: pip>=6.0.0 in /usr/local/lib/python3.6/site-packages (from pipdeptree) (21.2.4)
Installing collected packages: pipdeptree
Successfully installed pipdeptree-2.2.1
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
WARNING: You are using pip version 21.2.4; however, version 21.3.1 is available.
You should consider upgrading via the '/usr/local/bin/python -m pip install --upgrade pip' command.
root@489051b35666:/# pip install pip --user
Requirement already satisfied: pip in /usr/local/lib/python3.6/site-packages (21.2.4)
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
WARNING: You are using pip version 21.2.4; however, version 21.3.1 is available.
You should consider upgrading via the '/usr/local/bin/python -m pip install --upgrade pip' command.
root@489051b35666:/# pip install pip==21.2.4 --user^C
root@489051b35666:/# pipdeptree -u

root@489051b35666:/# pip install pip==21.2.4 --user
Requirement already satisfied: pip==21.2.4 in /usr/local/lib/python3.6/site-packages (21.2.4)
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
WARNING: You are using pip version 21.2.4; however, version 21.3.1 is available.
You should consider upgrading via the '/usr/local/bin/python -m pip install --upgrade pip' command.
root@489051b35666:/# pipdeptree -u

root@489051b35666:/# pip install pip==21.3.1 --user
Collecting pip==21.3.1
  Downloading pip-21.3.1-py3-none-any.whl (1.7 MB)
     |████████████████████████████████| 1.7 MB 2.5 MB/s 
Installing collected packages: pip
  WARNING: The scripts pip, pip3 and pip3.6 are installed in '/root/.local/bin' which is not on PATH.
  Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
Successfully installed pip-21.3.1
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
WARNING: You are using pip version 21.2.4; however, version 21.3.1 is available.
You should consider upgrading via the '/usr/local/bin/python -m pip install --upgrade pip' command.
root@489051b35666:/# pipdeptree -u
pip==21.3.1

Implementation plan

  • update the pip and setuptools package managers to only install user dependencies via the --user command
  • update pipdeptree to only show a tree of user installed packages via -u

Testing

We can rely on the existing image tests and job tests. There's no need for new test projects.

  • Integration tests for pip pass.
  • Integration tests for setuptools pass.

What is the current bug behavior?

pipdeptree is in the dependency list

What is the expected correct behavior?

pipdeptree is NOT in the dependency list

Possible fixes

  • Install dependencies with pip install --user, and list them with pipdeptree --user.
    • Pro: Simple. No need for virtualenv.
    • Pro: No need to upgrade to the latest version of pipdeptree (though this is something we should do).
  • Leverage virtualenv to isolate the dependencies of the scanned projects from the dependencies of gemnasium-python.
  • Leverage pipdeptree --exclude to exclude pipdeptree, pip, and other package managers. See usage. Warning!* This works with pipdeptree v2 but isn't supported by v1. See comment
    • Con: We might skip a package that the scanned project explicitly depends on, resulting in a false negative. See comment.
    • Con: We would need a CI variable to mitigate that, and we would then have to test, document, and maintain that variable.
Edited by Fabien Catteau