Skip to content

Allow user to specify multiple pull policies for Kubernetes executor

Problem to solve

Lost network connection to a container registry used for retrieving container images required for CI job execution can result in lost development time hours. In some instances, these outages can also negatively impact revenue generation if the business relies on software updates to production environments that can no longer complete due to the inability to execute the CI jobs because of inaccessible container images.

Today in technologies like Kubernetes, and gitlab-runner, the container image pull policy logic does not include any fall back mechanisms for network connection failures to the target container registry.

Having the ability to use locally cached container images in the CI jobs can mitigate the impact caused by lost connectivity to the target container registry.

Proposal

Instead of us creating a new pull policy, we allow users to define multiple pull policies. For example, the user can define pull_policy = ["always", "if-not-present"] inside of their config.toml. It will first use the always pull policy, if that fails it will use the next one in line which is if-not-present. This will achieve the always-or-fallback pull policy without introducing it. A small PoC of this was achieved in !2587 (closed)

So for example imagine I have the following config.toml

concurrent = 1
check_interval = 0

[session_server]
  session_timeout = 1800

[[runners]]
  name = "steve-mbp-gitlab.local"
  url = "https://gitlab.com/"
  token = "xxxxxx"
  executor = "kubernetes"
  [runners.kubernetes]
    image = "localonly/alpine:3.12"
    pull_policy = ["always", "if-not-present"]       # Multiple pull policies specified, we'll go one by one if it fails. In this case, first it will try and pull the image, then use the local image if it's present

We can it working like below

Screen_Shot_2020-11-25_at_13.33.10

Specification

  • Allow pull_policy for the executorkubernetes to be either a string pull_policy = "always" or a slice of strings pull_policy = ["always", "if-not-present"]
  • Start with the first pull policy (left to right) if any error is presented, even a 403 (because it might be a production issue) fallback to the next pull policy. For example, if we have pull_policy = ["always", "if-not-present"] we will use always and then if it errors we will use if-not-present. We need to check for the error why a pod creation failed and see if it's because of pulling images.
  • Show a warning level log that the first pull policy failed.
  • Show an info level log that we are changing the pull policy.
Edited by Steve Xuereb