Multi-job pipelines using shell executor fundamentally incompatible with python3 venv and other absolute path-dependent software

Executive Summary

GitLab CI's Shell executor is fundamentally incompatible with python3 virtual environments for pipelines with multiple jobs because:

python3 venv embeds absolute paths in the generated environment
GitLab CI's design is that all jobs shall be "workspace agnostic", meaning the absolute path to the workspace for job1 may not be the same as the path for job2

If this is a limitation of the GitLab CI design when using shell executors, it should be noted in the shell executor documentation.

Context: python3 venv

From all the research I've found, python3 virtual environments are intended to be built in the location in which they are to be used. For many groups, the virtual environment is created inside the repository as an infrequent setup process. For example, this means executing something like this:

git clone <repo>
cd <repo>
python3 -m venv .venv && source .venv/bin/activate && pip3 install -r requirements.txt
# Now <repo>/.venv/ contains the virtual environment that repo uses
# and all the user must do is 'source .venv/bin/activate' to use it

Note that .venv/bin/activate is an untracked generated file not intended to be modified by the end user, and this file contains hardcoded absolute paths to <repo>. Because of this, you cannot naively mv <repo> /some/other/path and use /some/other/path/<repo>/.venv because .venv/bin/activate no longer points to a valid absolute path.

Detailed description of the Problem for Shell Executors

For Shell executors, jobs run natively on the host in a workspace managed by GitLab CI, matching the naming scheme <working-directory>/builds/<short-token>/<concurrent-id>/<namespace>/<project-name> where <concurrent-id> is outside the control of the user and managed automatically by GitLab CI. The issue described above can be replicated with a simple two-job pipeline that resembles this configuration:

variables:
    GIT_STRATEGY: fetch

stages:
  - Build
  - Run

Build venv:
  stage: Build
  script:
    - python3 -m venv .venv && source .venv/bin/activate  # Create <repo>/.venv/
  artifacts:
    paths:
      - .venv/  # <-- collect as artifact for downstream job "Use venv"

Use venv:
  stage: Run
  dependencies:
    - Build venv
  script:
    - source .venv/bin/activate
    - echo $VIRTUAL_ENV  # <-- this will point to workspace of "Build venv" job, which may be outside this job's workspace

One of the most confusing things about this problem is the last part: "which may be outside this job's workspace". There are actually several possibilities of what could happen for the second job, and folks new to GitLab CI will surely be confused by the behavior they witness:

The second job Use venv will obtain the same workspace as the first job Build venv, so no problems will occur and both jobs will complete with success
The second job will run in a workspace whose collected/downloaded .venv/bin/activate (from the first job) points outside this job's workspace - this forks into more possibilities:
1. The other workspace is unused but happens to match the .venv of the first job, so the second job completes with success
2. The other workspace is unused and doesn't match the .venv of the first job, leading to the second job failing
3. The other workspace is unused, then becomes used during execution of the second job, leading to strange errors as that workspace is cleaned/altered
4. The other workspace is currently in-use, but happens to match the .venv of the first job , so the second job completes with success
5. The other workspace is currently in-use, and doesn't match the .venv of the first job, leading to the second job failing

All of this behavior will appear to the user as weird, undefined behavior -- "sometimes it works, sometimes I get weird failures I don't understand". Only a deep knowledge of the GitLab CI design, the shell executor, and path-dependent software systems like venv is sufficient to truly understand what's going wrong here.

Context: This isn't a problem for Docker Executors

In a docker executor, the job workspace is mounted and always appears of the form /builds/<group>/<project/ inside the container where the job executes. This means all jobs in the pipeline will always have the same absolute path and therefore the .venv created in one job (and collected as an artifact) will work when downloaded to all downstream job workspaces within the container.

Workarounds

In order to ensure each job is using a virtual environment built for the arbitrary workspace GitLab CI provides, our team has no choice but to build the <repo>/.venv directory before every job that runs in lieu of collecting that area as an artifact to be shared with downstream jobs. This vastly increases testing time and network bandwidth on a per-pipeline basis. Our only other option would be to collapse all jobs into a single job which tests everything. For us the former is the less-bad option.

Similar issues

I have searched open GitLab issues and only found one related/similar issue:

#27995

Discussion: Is a solution for the shell executor even possible?

From what I understand about this issue, this isn't a "bug", it's a fundamental disagreement between GitLab CI's workspace-agnostic design (using the shell executor) and software tools which embed/require a hardcoded path within their process. I suspect this error would manifest similarly for groups building an executable with gcc with an absolute -rpath and then collecting the resultant executable as an artifact for downstream jobs.

If the best forward path is "just use the Docker executor", I think that's reasonable -- but I'd ask that the documentation on shell executors be updated to specifically describe this issue so that others can understand this limitation of the shell executor and avoid it entirely.

Edited Jul 29, 2022 by Dan Jordan