Skip to content

Add gcc build environment and PIP_DEPENDENCY_PATH

Lucas Charles requested to merge bypass-installation into master

What does this MR do?

This MR updates our gemnasium-python image in 2 primary ways: by providing a gcc toolchain for building C-based extensions and by providing an optional method of passing pre-packaged dependencies. While both enhance the existing behavior, they work well together to ensure our image covers the needs of the greatest number of users.

  • Adds gcc toolchain within base container (happy-path demonstrated with updated test fixture)
  • Adds support for specifying PIP_DEPENDENCY_PATH, allowing dependencies to be pre-fetched and passed as artifacts to the scan stage (demonstrated with an MR to our test project python-pip)

There is quite a few changes here, so I apologize in advance for not breaking these up more. Many build on the previous ones however, so I made an effort to keep each stage isolated per commit for easier review.

Alpine vs Debian

To maintain backwards compatibility and minimize the changes I kept the base image as alpine, however it's worth discussing whether we should instead switch this to python:3.6-slim instead. The primary reason is that Python's PEP-513 Portable Linux Distribution format does not directly support alpine. The current alpine+gcc platform tag is linux_x64_86. The PEP-513 distribution format essentially specifies a platform tag that enables a packaged dependency to be used across many compatible linux systems, where the platform tag matches the formatt manylinux* (i.e. manylinux1_x86_64 or manylinux1_i686).

Since many popular pypi packages ship with a wheel using the manylinux format such as numpy or cryptography, it could make sense to prefer a debian-based system in place.

This means that if there is a provided PyPI package for cryptography-2.2.2-cp27-cp27m-manylinux1_x86_64.whl, it won't be recognized as Alpine-compatible and must fallback to building from source via cryptography-2.2.2.tar.gz.

There are some workarounds but they can get pretty hacky and I don't see much of a reason to do so beyond just swapping our base image.

Just to demonstrate how pip's dependency management system works, here's an example of the dependency management priority order for each container. Note that the tuple is {python tag}-{abi tag}-{platform tag} per PEP-425, but we primarily care about the platform tag here:

FROM python:3

❯ docker run -it python:3 python -c 'import pip._internal; from pprint import pprint; pprint(pip._internal.pep425tags.get_supported())'
[('cp37', 'cp37m', 'manylinux2010_x86_64'),
 ('cp37', 'cp37m', 'manylinux1_x86_64'),
 ('cp37', 'cp37m', 'linux_x86_64'),
 ('cp37', 'abi3', 'manylinux2010_x86_64'),
 ('cp37', 'abi3', 'manylinux1_x86_64'),
 ('cp37', 'abi3', 'linux_x86_64'),
 ('cp37', 'none', 'manylinux2010_x86_64'),
 ('cp37', 'none', 'manylinux1_x86_64'),
 ('cp37', 'none', 'linux_x86_64'),
 ('cp36', 'abi3', 'manylinux2010_x86_64'),
 ('cp36', 'abi3', 'manylinux1_x86_64'),
 ('cp36', 'abi3', 'linux_x86_64'),
 ('cp35', 'abi3', 'manylinux2010_x86_64'),
 ('cp35', 'abi3', 'manylinux1_x86_64'),
 ('cp35', 'abi3', 'linux_x86_64'),
 ('cp34', 'abi3', 'manylinux2010_x86_64'),
 ('cp34', 'abi3', 'manylinux1_x86_64'),
 ('cp34', 'abi3', 'linux_x86_64'),
 ('cp33', 'abi3', 'manylinux2010_x86_64'),
 ('cp33', 'abi3', 'manylinux1_x86_64'),
 ('cp33', 'abi3', 'linux_x86_64'),
 ('cp32', 'abi3', 'manylinux2010_x86_64'),
 ('cp32', 'abi3', 'manylinux1_x86_64'),
 ('cp32', 'abi3', 'linux_x86_64'),
 ('py3', 'none', 'manylinux2010_x86_64'),
 ('py3', 'none', 'manylinux1_x86_64'),
 ('py3', 'none', 'linux_x86_64'),
 ('cp37', 'none', 'any'),
 ('cp3', 'none', 'any'),
 ('py37', 'none', 'any'),
 ('py3', 'none', 'any'),
 ('py36', 'none', 'any'),
 ('py35', 'none', 'any'),
 ('py34', 'none', 'any'),
 ('py33', 'none', 'any'),
 ('py32', 'none', 'any'),
 ('py31', 'none', 'any'),
 ('py30', 'none', 'any')]

FROM python:3-alpine

❯ docker run -it python:3-alpine python -c 'import pip._internal; from pprint import pprint; pprint(pip._internal.pep425tags.get_supported())'
[('cp37', 'cp37m', 'linux_x86_64'),
 ('cp37', 'abi3', 'linux_x86_64'),
 ('cp37', 'none', 'linux_x86_64'),
 ('cp36', 'abi3', 'linux_x86_64'),
 ('cp35', 'abi3', 'linux_x86_64'),
 ('cp34', 'abi3', 'linux_x86_64'),
 ('cp33', 'abi3', 'linux_x86_64'),
 ('cp32', 'abi3', 'linux_x86_64'),
 ('py3', 'none', 'linux_x86_64'),
 ('cp37', 'none', 'any'),
 ('cp3', 'none', 'any'),
 ('py37', 'none', 'any'),
 ('py3', 'none', 'any'),
 ('py36', 'none', 'any'),
 ('py35', 'none', 'any'),
 ('py34', 'none', 'any'),
 ('py33', 'none', 'any'),
 ('py32', 'none', 'any'),
 ('py31', 'none', 'any'),
 ('py30', 'none', 'any')]

Pipenv support

From what I can tell Pipenv installation should work much in the same way, where-as a local path can be specified as a relative source link within Pipfile. This may require further testing but given the inclusion of the gcc toolchain (out of the box behavior, ftw), this doesn't seem as pressing until we hear of a usecase where documenting/finalizing this configuration is needed.

What are the relevant issue numbers?

https://gitlab.com/gitlab-org/gitlab-ee/issues/6713

Does this MR meet the acceptance criteria?

Edited by 🤖 GitLab Bot 🤖

Merge request reports