Add gcc build environment and PIP_DEPENDENCY_PATH
What does this MR do?
This MR updates our gemnasium-python
image in 2 primary ways: by providing a gcc toolchain for building C-based extensions and by providing an optional method of passing pre-packaged dependencies. While both enhance the existing behavior, they work well together to ensure our image covers the needs of the greatest number of users.
- Adds
gcc
toolchain within base container (happy-path demonstrated with updated test fixture) - Adds support for specifying
PIP_DEPENDENCY_PATH
, allowing dependencies to be pre-fetched and passed as artifacts to the scan stage (demonstrated with an MR to our test projectpython-pip
)
There is quite a few changes here, so I apologize in advance for not breaking these up more. Many build on the previous ones however, so I made an effort to keep each stage isolated per commit for easier review.
Alpine vs Debian
To maintain backwards compatibility and minimize the changes I kept the base image as alpine
, however it's worth discussing whether we should instead switch this to python:3.6-slim
instead. The primary reason is that Python's PEP-513 Portable Linux Distribution format does not directly support alpine. The current alpine+gcc platform tag is linux_x64_86
. The PEP-513 distribution format essentially specifies a platform tag that enables a packaged dependency to be used across many compatible linux systems, where the platform tag matches the formatt manylinux*
(i.e. manylinux1_x86_64
or manylinux1_i686
).
Since many popular pypi packages ship with a wheel using the manylinux
format such as numpy or cryptography, it could make sense to prefer a debian-based system in place.
This means that if there is a provided PyPI package for cryptography-2.2.2-cp27-cp27m-manylinux1_x86_64.whl
, it won't be recognized as Alpine-compatible and must fallback to building from source via cryptography-2.2.2.tar.gz
.
There are some workarounds but they can get pretty hacky and I don't see much of a reason to do so beyond just swapping our base image.
Just to demonstrate how pip
's dependency management system works, here's an example of the dependency management priority order for each container. Note that the tuple is {python tag}-{abi tag}-{platform tag}
per PEP-425, but we primarily care about the platform tag here:
FROM python:3
❯ docker run -it python:3 python -c 'import pip._internal; from pprint import pprint; pprint(pip._internal.pep425tags.get_supported())'
[('cp37', 'cp37m', 'manylinux2010_x86_64'),
('cp37', 'cp37m', 'manylinux1_x86_64'),
('cp37', 'cp37m', 'linux_x86_64'),
('cp37', 'abi3', 'manylinux2010_x86_64'),
('cp37', 'abi3', 'manylinux1_x86_64'),
('cp37', 'abi3', 'linux_x86_64'),
('cp37', 'none', 'manylinux2010_x86_64'),
('cp37', 'none', 'manylinux1_x86_64'),
('cp37', 'none', 'linux_x86_64'),
('cp36', 'abi3', 'manylinux2010_x86_64'),
('cp36', 'abi3', 'manylinux1_x86_64'),
('cp36', 'abi3', 'linux_x86_64'),
('cp35', 'abi3', 'manylinux2010_x86_64'),
('cp35', 'abi3', 'manylinux1_x86_64'),
('cp35', 'abi3', 'linux_x86_64'),
('cp34', 'abi3', 'manylinux2010_x86_64'),
('cp34', 'abi3', 'manylinux1_x86_64'),
('cp34', 'abi3', 'linux_x86_64'),
('cp33', 'abi3', 'manylinux2010_x86_64'),
('cp33', 'abi3', 'manylinux1_x86_64'),
('cp33', 'abi3', 'linux_x86_64'),
('cp32', 'abi3', 'manylinux2010_x86_64'),
('cp32', 'abi3', 'manylinux1_x86_64'),
('cp32', 'abi3', 'linux_x86_64'),
('py3', 'none', 'manylinux2010_x86_64'),
('py3', 'none', 'manylinux1_x86_64'),
('py3', 'none', 'linux_x86_64'),
('cp37', 'none', 'any'),
('cp3', 'none', 'any'),
('py37', 'none', 'any'),
('py3', 'none', 'any'),
('py36', 'none', 'any'),
('py35', 'none', 'any'),
('py34', 'none', 'any'),
('py33', 'none', 'any'),
('py32', 'none', 'any'),
('py31', 'none', 'any'),
('py30', 'none', 'any')]
FROM python:3-alpine
❯ docker run -it python:3-alpine python -c 'import pip._internal; from pprint import pprint; pprint(pip._internal.pep425tags.get_supported())'
[('cp37', 'cp37m', 'linux_x86_64'),
('cp37', 'abi3', 'linux_x86_64'),
('cp37', 'none', 'linux_x86_64'),
('cp36', 'abi3', 'linux_x86_64'),
('cp35', 'abi3', 'linux_x86_64'),
('cp34', 'abi3', 'linux_x86_64'),
('cp33', 'abi3', 'linux_x86_64'),
('cp32', 'abi3', 'linux_x86_64'),
('py3', 'none', 'linux_x86_64'),
('cp37', 'none', 'any'),
('cp3', 'none', 'any'),
('py37', 'none', 'any'),
('py3', 'none', 'any'),
('py36', 'none', 'any'),
('py35', 'none', 'any'),
('py34', 'none', 'any'),
('py33', 'none', 'any'),
('py32', 'none', 'any'),
('py31', 'none', 'any'),
('py30', 'none', 'any')]
Pipenv support
From what I can tell Pipenv
installation should work much in the same way, where-as a local path can be specified as a relative source link within Pipfile
. This may require further testing but given the inclusion of the gcc toolchain (out of the box behavior, ftw), this doesn't seem as pressing until we hear of a usecase where documenting/finalizing this configuration is needed.
What are the relevant issue numbers?
https://gitlab.com/gitlab-org/gitlab-ee/issues/6713
Does this MR meet the acceptance criteria?
-
Changelog entry added -
Documentation created/updated for GitLab EE, if necessary -
Documentation created/updated for this project, if necessary -
Documentation reviewed by technical writer or follow-up review issue created -
Tests added for this feature/bug -
Job definition updated, if necessary -
Conforms to the code review guidelines -
Conforms to the Go guidelines -
Security reports checked/validated by reviewer