Configure multiple image pull policies for Docker executor
Release notes
The GitLab Runner Docker executor now includes a pull_policy configuration option that supports multiple values. This feature means that you can now specify in the gitlab-runner config.toml configuration file that multiple policies can be used by the Docker executor when retrieving a container image. For example, pull_policy =[always
, if-not-present
]. In this configuration example, the pull policy always will be attempted first. If the target container registry is not available, then the executor will fallback and use the if-not-present policy.
Problem to solve
Lost network connection to a container registry used for retrieving container images required for CI job execution can result in lost development time hours. In some instances, these outages can also negatively impact revenue generation if the business relies on software updates to production environments that can no longer complete due to the inability to execute the CI jobs because of inaccessible container images.
Today in technologies like Kubernetes, and gitlab-runner, the container image pull policy logic does not include any fall back mechanisms for network connection failures to the target container registry.
Having the ability to use locally cached container images in the CI jobs can mitigate the impact caused by lost connectivity to the target container registry.
Background
This is a spiritual successor to #3279 (closed), which was closed with the recognition that the gitlab-runner pull_policy of always
checks to see if an image of the same version is available locally, and only fetches from the remote registry if necessary. However, when a specified remote is unavailable, always
fails.
A pull policy that:
- Checks for the latest image on the remote registry
- Checks for the presence of a locally-cached copy and leverages that first
- If no locally-cached copy is available, fetches from remote
- And finally, if the remote registry is unavailable, leverages the most recent locally-cached copy (if available)
...would improve the robustness of often-run pipelines and limit the impact of registry outages.
if-newer
still seems like an acceptable name, or something such as always-with-fallback
?
Potential Risk: This approach may lead to unexpected results for pipelines not often run, as if the remote is unavailable, we'd have no way to confirm how up-to-date the locally-cached image may be. For this reason it's probably best as a distinct pull policy from our default always
behavior.
Proposal
Instead of us creating a new pull policy, we allow users to define multiple pull policies. For example, the user can define pull_policy = ["always", "if-not-present"]
inside of their config.toml
. It will first use the always
pull policy, if that fails it will use the next one in line which is if-not-present
. This will achieve the always-or-fallback
pull policy without introducing it. A small PoC of this was achieved in !2587 (closed)
So for example imagine I have the following config.toml
concurrent = 1
check_interval = 0
[session_server]
session_timeout = 1800
[[runners]]
name = "steve-mbp-gitlab.local"
url = "https://gitlab.com/"
token = "xxxxxx"
executor = "docker"
[runners.docker]
tls_verify = false
image = "localonly/alpine:3.12"
privileged = false
disable_entrypoint_overwrite = false
oom_kill_disable = false
disable_cache = false
volumes = ["/cache"]
pull_policy = ["always", "if-not-present"] # Multiple pull policies specified, we'll go one by one if it fails. In this case, first it will try and pull the image, then use the local image if it's present
shm_size = 0
We can it working like below
Specification
- Allow
pull_policy
for the executordocker to be either a stringpull_policy = "always"
or a slice of stringspull_policy = ["always", "if-not-present"]
, for example using custom unmarshaling created from the PoC - Start with the first pull policy (left to right) if any error is presented, even a
403
(because it might be a production issue) fallback to the next pull policy. For example, if we havepull_policy = ["always", "if-not-present"]
we will usealways
and then if it errors we will useif-not-present
. - Show a warning level log that the first pull policy failed.
- Show an info level log that we are changing the pull policy.
Check out the PoC that implements most of this apart from logging
Steps to implement this
- Ideally if time allows it, we should move all the pull policy logic into it's own package, for example
executors/docker/internal/pull
. This will also make sure that we have all the test coverage we need before we refactor - Allow users to specify a single string or a slice of string inside of the config
- Loop through all the pull policies specified until 1 succseds
- Update documentation showing off this feature. Being explicit of security implications because it will ignore the 403 error, and justify that is because auth can be down.
Related Links
- Kubernetes Up-Stream: Always-If-Available Image Pull Policy #95854
- Kubernetes executor: #27298 (closed)