Skip to content

POC support IdleCount for Kubernetes Executor

Georgi N. Georgiev requested to merge k8s_idle_count into master

In #3795 it was suggested that the Kubernetes executor should support IdleCount, much like the Docker Machine executor.

What does this MR do

This MR is POC of whether and how pre-creating build pods to be reused later could work.

How/could this work

There are a lot of technical challenges and limitations which I will go into later. In terms of whether it could work, the answer is probably, with limitations after a lot of work.

The Kubernetes executor in both code and logic, when creating pods(and all accompanying resources) is very involved with the executor instance and the current build. In it's currently form the Kubernetes executor doesn't haven't the foundation to support even an MVC for this feature.

Although idleCount sounds like the same feature as in Docker Machine, the two are very different. Docker Machine creates blank VMs which then run Docker Containers. The Kubernetes executor cannot create blank pods which then run build scripts inside of them.

I would say we should not implement this until we are sure there is enough demand to justify the technical complexity in an already complex executor. If we decide to implement it at some point, we should dedicate at least 1 milestone for development if the developer is already familiar with the executor and this POC.

Things that currently-kinda-almost work

Currently, pods are created and they sit and wait for work to be assigned to them. The helper image is configured to wait for IdleTime and if no work is picked up it will exit. At the moment it will not delete the pod. We will need liveness probe for that, which is easy to implement. With this, we can let the pods clean themselves. But not their resources, read more about that in the next section.

Also, pods are picked up to execute jobs. The jobs will fail while fetching the sources. This happens because as I mentioned above the creation of resources is very involved with the current Build, so the scripts and resources are all created from the current build. This means that the ci-token which is used to clone the GitLab repo and which is written in the get_sources script, which in turn is written inside the pod - is a token from an older build. This can probably be worked around by either changing the script or exposing the token as a variable.

image

Limitations

Most dynamic configurations, coming from the current build (the current iteration of the gitlab-ci.yaml) cannot be changed later or cannot be accounted for. Examples:

  • Overwrites - CPU, Memory, Namespace, Token etc. Most of these will only half work. Changing the CPU limits after the pod is already running isn't possible. Instead changing these settings will cause the pod to be restarted. This means that while we will be certain that there is enough space on the cluster for the pod, it could be relocated to another node, there might be other pods waiting to be scheduled and our pod might not be scheduled right the way. I honestly don't know how the Kubernetes scheduler will behave in these situations but it seems like: 1. If we support this it won't be very useful 2. We should probably not support this, or at the very least at the start

  • Different images set in the build - Since pods will be precreated with images already running inside, we can't know all the possible combinations. One possible solution is to have a list of images in the config. So we can have idle_images = [alpine:3.12, alpine, ubuntu:16]". Then with idle_count = 2`, we will have 2 * 3 = 6 idle pods at all times. This can quickly explode out of control and I can imagine that at scale it wouldn't be very useful and if not at scale, you don't need this feature at all.

  • Probably more things I haven't noticed

With this, we can prove that it's possible to have precreated pods, but with a few caveats.

Technical limitations, problems and details

During the POC of this feature I came to quite a few technical roadblocks, which even for the simplest POC in my mind, had to be overcome in a pretty major and involved way.

  1. The kubernetes executor is incredibly involved with itself. Every variable and resource that is created is stored in the executor's struct. During the initial development that's OK, however when trying to take out the creation of the pods, secrets, configmaps, etc. it quickly became apparent that I have no good way to track when a resource is available and it would often be nil. That's why I opted to do the smallest refactor I could in order to move away from assigning resources to the executor's constructor and instead have something closer to a factory function. This is still very involved with the executor and everything stored in it, but as a first step it decouples the kubernetes resources a bit more.

This bit of refactoring I feel is something we should consider in the Kuberentes executor since it's becoming huge at this point.

  1. How to have pods with IdleTime? The easiest solution would be to have the Runner manage that and cleanup pods after the timeout. However, since I also had #27719 (closed) on my plate I felt that I needed to take this a step further to account for the case where the Runner is stopped and X amount of Idle pods stay in the cluster running forever, waiting for a command. The solution I came up with is a Custom entry command for the helper image. It basically mimics a shell and actually passes all stdin/stdout to an underlying sh process directly. As an addition to this command, we can monitor whether there's a running build and exit in case the pod timed out.

This approach with the custom entry command is actually something which we could probably use for other functionalities in the future.

This being said, the Pod exits by itself and doesn't need a Runner to manage it approach has a limitation - it doesn't clean up secrets, config maps, etc. which were created alongside it. Similar to #4184 (closed). Kubernetes Garbage Collection could be a solution in this case, since as long as the Pod is deleted, it will guarantee that anything related, e.g. secrets, configmaps, etc is deleted as well.

What's the best way to test this MR?

What are the relevant issue numbers?

#3795

Edited by Georgi N. Georgiev

Merge request reports