Multi-job pipelines using shell executor fundamentally incompatible with python3 venv and other absolute path-dependent software
Executive Summary
GitLab CI's Shell executor is fundamentally incompatible with python3 virtual environments for pipelines with multiple jobs because:
-
python3venvembeds absolute paths in the generated environment - GitLab CI's design is that all jobs shall be "workspace agnostic", meaning the absolute path to the workspace for job1 may not be the same as the path for job2
If this is a limitation of the GitLab CI design when using shell executors, it should be noted in the shell executor documentation.
Context: python3 venv
From all the research I've found, python3 virtual environments are intended to be built in the location in which they are to be used. For many groups, the virtual environment is created inside the repository as an infrequent setup process. For example, this means executing something like this:
git clone <repo>
cd <repo>
python3 -m venv .venv && source .venv/bin/activate && pip3 install -r requirements.txt
# Now <repo>/.venv/ contains the virtual environment that repo uses
# and all the user must do is 'source .venv/bin/activate' to use it
Note that .venv/bin/activate is an untracked generated file not intended to be modified by the end user, and this file contains hardcoded absolute paths to <repo>. Because of this, you cannot naively mv <repo> /some/other/path and use /some/other/path/<repo>/.venv because .venv/bin/activate no longer points to a valid absolute path.
Detailed description of the Problem for Shell Executors
For Shell executors, jobs run natively on the host in a workspace managed by GitLab CI, matching the naming scheme <working-directory>/builds/<short-token>/<concurrent-id>/<namespace>/<project-name> where <concurrent-id> is outside the control of the user and managed automatically by GitLab CI. The issue described above can be replicated with a simple two-job pipeline that resembles this configuration:
variables:
GIT_STRATEGY: fetch
stages:
- Build
- Run
Build venv:
stage: Build
script:
- python3 -m venv .venv && source .venv/bin/activate # Create <repo>/.venv/
artifacts:
paths:
- .venv/ # <-- collect as artifact for downstream job "Use venv"
Use venv:
stage: Run
dependencies:
- Build venv
script:
- source .venv/bin/activate
- echo $VIRTUAL_ENV # <-- this will point to workspace of "Build venv" job, which may be outside this job's workspace
One of the most confusing things about this problem is the last part: "which may be outside this job's workspace". There are actually several possibilities of what could happen for the second job, and folks new to GitLab CI will surely be confused by the behavior they witness:
- The second job
Use venvwill obtain the same workspace as the first jobBuild venv, so no problems will occur and both jobs will complete with success - The second job will run in a workspace whose collected/downloaded
.venv/bin/activate(from the first job) points outside this job's workspace - this forks into more possibilities:- The other workspace is unused but happens to match the
.venvof the first job, so the second job completes with success - The other workspace is unused and doesn't match the
.venvof the first job, leading to the second job failing - The other workspace is unused, then becomes used during execution of the second job, leading to strange errors as that workspace is cleaned/altered
- The other workspace is currently in-use, but happens to match the
.venvof the first job , so the second job completes with success - The other workspace is currently in-use, and doesn't match the
.venvof the first job, leading to the second job failing
- The other workspace is unused but happens to match the
All of this behavior will appear to the user as weird, undefined behavior -- "sometimes it works, sometimes I get weird failures I don't understand". Only a deep knowledge of the GitLab CI design, the shell executor, and path-dependent software systems like venv is sufficient to truly understand what's going wrong here.
Context: This isn't a problem for Docker Executors
In a docker executor, the job workspace is mounted and always appears of the form /builds/<group>/<project/ inside the container where the job executes. This means all jobs in the pipeline will always have the same absolute path and therefore the .venv created in one job (and collected as an artifact) will work when downloaded to all downstream job workspaces within the container.
Workarounds
In order to ensure each job is using a virtual environment built for the arbitrary workspace GitLab CI provides, our team has no choice but to build the <repo>/.venv directory before every job that runs in lieu of collecting that area as an artifact to be shared with downstream jobs. This vastly increases testing time and network bandwidth on a per-pipeline basis. Our only other option would be to collapse all jobs into a single job which tests everything. For us the former is the less-bad option.
Similar issues
I have searched open GitLab issues and only found one related/similar issue:
Discussion: Is a solution for the shell executor even possible?
From what I understand about this issue, this isn't a "bug", it's a fundamental disagreement between GitLab CI's workspace-agnostic design (using the shell executor) and software tools which embed/require a hardcoded path within their process. I suspect this error would manifest similarly for groups building an executable with gcc with an absolute -rpath and then collecting the resultant executable as an artifact for downstream jobs.
If the best forward path is "just use the Docker executor", I think that's reasonable -- but I'd ask that the documentation on shell executors be updated to specifically describe this issue so that others can understand this limitation of the shell executor and avoid it entirely.