Allow volume mounts to be configured on a job level for services and build containers

Description

Projects utilizing the docker executor should be able to configure volume mounts for job service containers in order to allow projects to mount configuration files and other data into these containers and (optionally) mount those same volumes to the build container.

Background:

This relates heavily to #3207 but I feel this proposal is different enough to merit its own issue. This issue, in contrast, aims (1) to define a more narrow problem statement, and (2) to propose an alternative solution to that problem statement, with the ultimate goal of carving out what is hopefully a small enough problem that it can be well-defined and executed on.

Other related issues: #1525, #15840

Problem statement

One of the primary unmet use cases GitLab customers have for docker volumes is getting data files into service containers (those running under services:). As necessary background, services: are often used much like an analog for additional services that might otherwise be defined using the docker compose specification. Today, GitLab offers a subset of configuration keys for services: that are analogs to docker-compose service configurations, like entrypoint and command. One of the common configurations for docker-compose services that is yet still unsupported for GitLab CI services is the volumes: configuration.

What customers need in GitLab CI with respect to these volume-related features:

Mount pre-existing files/directories into service containers
Some use cases for this might include:
a. mounting project-specific config files for services (for example, an apache, nginx or postgres configuration)
b. mounting SSL/TLS certificates
Sharing data volumes among/between service containers and the build container
Some use cases might include:
a. services that share storage (simulating a shared NFS mount, for example)
b. observe file-based logs of services (and maybe artifact them?)
c. obtain data files output by a service
d. populate data for a running service (prepopulating a database, for example)
... (other use cases???)

While this might not cover every use case one might imagine for docker volumes with GitLab CI, it's my perception that solving these use cases hits a large majority of users asking for the docker volumes feature as proposed in #3207.

Existing solutions/workarounds

Today, there might be a few ways GitLab customers can leverage docker volumes, albeit with serious limitations, particularly for shared runners.

Use the runner config.toml to mount a directory from the host or create a data volume.
volumes = ["/path/inside/container", "/host/path:/path/inside/container"]
However, this has several limitations including:
a. This requires the administrator of a runner to configure.
b. The configuration applies to all services and build containers. Users often want to change the files mounted on a job-by-job basis and two services might use the same mount point, but need different configurations.
c. For use cases where write access is needed, conflicts readily arise if multiple jobs need to use this volume at the same time. Often, it is not possible to control the directory which is used.
A modified image/entrypoint (and optionally environment variables)
This might allow a project to create configuration files for services, say, in combination with a custom defined entrypoint for a service. The environment variable could, in principle, be generated dynamically.

my_job:
  variables:
    DAEMON_CONFIG: '{"bip": "192.168.123.1/24"}'
  services:
    - name: docker:dind
      entrypoint: ["/bin/sh", "-c", "mkdir -p /etc/docker && echo \"${DAEMON_CONFIG}\" > /etc/docker/daemon.json && exec dockerd-entrypoint.sh"]

This configuration could be achieved with a command, but hopefully serves the illustrative purpose.

It might even be possible to configure an entrypoint to download pipeline artifacts or other complex data generation. Though, in any case, this has some obvious downsides and is very cumbersome and error-prone to configure correctly. This also wouldn't be viable if the files needed were larger binary files (say, a docker-credential helper program).

Alternatively, a separate image may be created with the necessary configurations bundled. However, this would require a docker-in-docker setup to build the image as well as a registry to push the image to in order to use it in services:, which might be overly cumbersome if all that is needed is a simple configuration file.

In short, the approach is incomplete and fairly inelegant.

Proposal

As stated, what customers need is a way to put files into their service containers. To fulfill this need, I propose the following capabilities be added to GitLab runner's docker executor:

The ability for service containers to mount repository files/directories into a specified mount point.
This will fulfill simple use cases where the repository contains the files needed, like nginx configurations to serve a website frontend that is under test in GitLab CI.
The ability for service containers to mount pipeline artifacts (or caches) into a specified mount point.
This will allow pipelines to generate data or configuration files (such as database configurations or
The ability to define volumes that can be shared among/between service containers and build containers within a given job. (volumes could be namespaced by job id to prevent collisions and transparently aliased with configured names)

Between these features, most use cases for volumes with services: should be satisfied.

The following limitations would be understood:

You cannot use bind mounts from the host (in other words, it is not the same control/feature as in the config.toml)
Volume mounts defined in .gitlab-ci.yml cannot conflict with mounts specified in the runner's config.toml, this should result in an immediate build failure.
Volumes only exist for the duration of the job and are accessible only by the build container or services for that job. Use artifacts to pass any contents of volumes to other jobs/stages.
Initially, the only valid sources are artifacts and repository files. The default storage driver is used to create the volume. In the future, support could be added for other data sources or storage drivers (say, remote storage with nfs driver).

This would allow the feature to be implemented in a way such that it can reasonably co-exist with pre-existing configuration features and does not risk security issues by allowing projects to create bind mounts on the runner host and allows administrators to retain appropriate controls over mounted volumes.

Maybe it would look something like:

setup:
  stage: .pre
  script:
    - generate-pg-data
    - generate-certificates
  artifacts:
    paths:
      - path/to/pgdata
      - path/to/certs

myjob:
  volumes:
    pgdata: # create a new volume, populate it with some data
      data_sources:
        # artifact paths work the same as if downloaded the workdir normally
        # you can (re)locate them into other paths in the volume.
        - artifacts: ["path/to/artifact_file_or_directory:/path/in/volume"]
        # repository files can also be used to populate the volume
        - repository: ["path/to/repository/file_or_directory:/path/in/volume"]
    certs:
      data_sources:
        - artifacts:
          - "certs/privkey.pem:/privkey.pem"
      mount_point: "$CI_PROJECT_DIR/certs"  # optionally define the mount point for the build container

  services:
    - name: postgres
      volumes:
        - "pgdata:/var/lib/postgresql/data"
    - name: docker:dind
      volumes:
        - certs:/certs
  ...

There is a lot of room to make the scheme more elegant and concise, but hopefully this at least helps get the idea across.

Benefit of doing

Primarily, it allows GitLab customers looking to place files into their services a method to do that. It makes the services: feature more robust and more apt to replace the need for jobs to utilize docker-in-docker and docker-compose directly in their jobs. In many cases, this could make runner setups more secure (no need for privileged containers or mounting docker sockets).

Implementation details

Without getting too specific too quickly, I can imagine some broad strokes of how this might work. Of course there are other details which must be hammered out, this isn't intended to provide complete details or even the only way it has to be. Just one idea of how it might be possible.

The GitLab runner's docker integration might be changed to do something like this (amended from How Docker Integration Works):

Create any defined volume for the job
a. create the docker volumes with the default storage driver
b. run a helper container that mounts all the volumes and populates the volumes with the defined data sources (download from repository and/or restore cache/artifacts and copy to defined locations)
Create any service container: mysql, postgresql, mongodb, redis, mounting any defined volumes for the service.
Create a cache container to store all volumes as defined in config.toml and Dockerfile of build image.
Create a build container, link any service container to build container, and mount any volume with a defined mount point.
Start the build container, and send a job script to the container.
Run the job script.
Checkout code in: /builds/group-name/project-name/.
Run any step defined in .gitlab-ci.yml.
Check the exit status of build script.
Remove the build container, any created service containers, and any created data volumes. (Cache volume behavior remain unchanged)

Edited Aug 08, 2021 by Spencer Phillip Young