Configure multiple image pull policies for Docker executor
The GitLab Runner Docker executor now includes a pull_policy configuration option that supports multiple values. This feature means that you can now specify in the gitlab-runner config.toml configuration file that multiple policies can be used by the Docker executor when retrieving a container image. For example, pull_policy =[
if-not-present]. In this configuration example, the pull policy always will be attempted first. If the target container registry is not available, then the executor will fallback and use the if-not-present policy.
Problem to solve
Lost network connection to a container registry used for retrieving container images required for CI job execution can result in lost development time hours. In some instances, these outages can also negatively impact revenue generation if the business relies on software updates to production environments that can no longer complete due to the inability to execute the CI jobs because of inaccessible container images.
Today in technologies like Kubernetes, and gitlab-runner, the container image pull policy logic does not include any fall back mechanisms for network connection failures to the target container registry.
Having the ability to use locally cached container images in the CI jobs can mitigate the impact caused by lost connectivity to the target container registry.
This is a spiritual successor to #3279 (closed), which was closed with the recognition that the gitlab-runner pull_policy of
always checks to see if an image of the same version is available locally, and only fetches from the remote registry if necessary. However, when a specified remote is unavailable,
A pull policy that:
- Checks for the latest image on the remote registry
- Checks for the presence of a locally-cached copy and leverages that first
- If no locally-cached copy is available, fetches from remote
- And finally, if the remote registry is unavailable, leverages the most recent locally-cached copy (if available)
...would improve the robustness of often-run pipelines and limit the impact of registry outages.
if-newer still seems like an acceptable name, or something such as
Potential Risk: This approach may lead to unexpected results for pipelines not often run, as if the remote is unavailable, we'd have no way to confirm how up-to-date the locally-cached image may be. For this reason it's probably best as a distinct pull policy from our default
Instead of us creating a new pull policy, we allow users to define multiple pull policies. For example, the user can define
pull_policy = ["always", "if-not-present"] inside of their
config.toml. It will first use the
always pull policy, if that fails it will use the next one in line which is
if-not-present. This will achieve the
always-or-fallback pull policy without introducing it. A small PoC of this was achieved in !2587 (closed)
So for example imagine I have the following
concurrent = 1 check_interval = 0 [session_server] session_timeout = 1800 [[runners]] name = "steve-mbp-gitlab.local" url = "https://gitlab.com/" token = "xxxxxx" executor = "docker" [runners.docker] tls_verify = false image = "localonly/alpine:3.12" privileged = false disable_entrypoint_overwrite = false oom_kill_disable = false disable_cache = false volumes = ["/cache"] pull_policy = ["always", "if-not-present"] # Multiple pull policies specified, we'll go one by one if it fails. In this case, first it will try and pull the image, then use the local image if it's present shm_size = 0
We can it working like below
pull_policyfor the to be either a string
pull_policy = "always"or a slice of strings
pull_policy = ["always", "if-not-present"], for example using custom unmarshaling created from the PoC
- Start with the first pull policy (left to right) if any error is presented, even a
403(because it might be a production issue) fallback to the next pull policy. For example, if we have
pull_policy = ["always", "if-not-present"]we will use
alwaysand then if it errors we will use
- Show a warning level log that the first pull policy failed.
- Show an info level log that we are changing the pull policy.
Check out the PoC that implements most of this apart from logging
Steps to implement this
- Ideally if time allows it, we should move all the pull policy logic into it's own package, for example
executors/docker/internal/pull. This will also make sure that we have all the test coverage we need before we refactor
- Allow users to specify a single string or a slice of string inside of the config
- Loop through all the pull policies specified until 1 succseds
- Update documentation showing off this feature. Being explicit of security implications because it will ignore the 403 error, and justify that is because auth can be down.