Next iteration of CI/CD GitLab Runner autoscaling
In order to move GitLab Runner autoscaling to the next level, and resolve our problem with using Docker Machine that is deprecated and no longer maintained, we want to iterate and devise the next architecture of GitLab Runner Auto-scaling. One of the options is iterating towards using Kubernetes and a new scheduler, presumably using nested virtualization (for example Firecracker).
We already do have an issue about using Kubernetes with gVisor, but gVisor currently does not support running Docker in Docker
- Use Kubernetes to run builds and to provide stable API,
- Redesign Docker executor and Kubernetes provider to make them backwards compatible,,
- Consider using micro-vms via nested virtualization on top of KVM to run builds,
- Use auto-scaling managers provided by Kubernetes Cloud Managers,
interruptible: trueconfig in
.gitlab-ci.ymlto unblock usage of preemptible/spot instances,
- Measure everything, experiment with various configurations, iterate wisely.
- we currently run around 400 000 builds daily on shared runners
- we create a separate virtual machine for every build
- 50% percentile of builds run for less than 3 minutes
- it takes around 30 seconds to provision a VM and up 60 seconds to tear it down
- we are being billed for entire VM lifecycle, since creation starts until teardown is complete
- average machines utilization is low, around 25% CPU, 75% memory
- using preemptible / spot instances can be only done through a workaround using
- configuring auto-scaling by users and customers is still quite difficult
- Docker Machine is not maintained anymore, we need to move to a different solution, Kubernetes being the most obvious one
Our assumptions are that reducing the machine management time by using nested virtualization, improving utilization by running builds on Kubernetes and unblocking preemtible/spot instance we could reduce the CI/CD compute costs by more than 50%.
Recording of the first meeting about this -> https://youtu.be/B7V8e0HPQ9k (outdated)