Skip to content

Migrate away from Docker Machine for autoscaling

Introduction

GitLab Runner provides autoscaling that provides the ability to utilize resources in a more elastic and dynamic way. Under the hood, this is using Docker Machine to provision the machine for multiple cloud providers thanks to it's machine drivers. The problem with using Docker Machine is that it's in maintenance mode, which puts us in a hard position to keep using it, we already have a fork with specific fixes for GitLab.com which is not ideal since it's a large maintenance cost for the ~Verify team. We need to discuss/think of ways to on how to support autoscaling without Docker Machine.

Alternatives to Docker Machine

Infrakit

infrakit is the successor of Docker Machine that is also maintained by Docker Inc. Using infrakit would require us to keep using the existing scheduler that we use for Docker Machine. The scheduler works fine, it has been working fine for a long time, but it does bring a lot of maintenance, technical debt and everything else more software we maintain brings when compared to something we get for "free" using kubernetes. It's not clear if infrakit can be used to provision Windows-based machines and that is something we need to verify.

Kubernetes

We generally push customers to use the kubernetes executor since it provides autoscaling out of the box and has one of the best workload schedulers. When customers find issues with Docker Machine we always suggest to use Kubernetes since it's better, and they get a lot more benefits. Currently, we can't run GitLab.com shared Runners on Kubernetes for the fact that we run untrusted code from users, that can be used to escape from containers and cause harm to the infrastructure. So we need some kind of isolation that a full-blown Virtual Machine brings or something similar, like what we have right now. There are a few areas we can/need to explore:

Use terraform to provision machines

Terraform can be used to provision infrastructure. If we keep the same scheduler mechanism, but instead of running docker machine commands we run terraform would it be possible? The only issue with this is we will end up with the same problem as the executors, everyone wants his own provisioner to be used and not terraform.

Criteria to move away from Docker Machine

  • All the benefits of autoscaling
  • We need to provide an easy way for users to migrate to the new schedule/infra provisioner
  • GitLab.com shared Runner can use this, with complete isolation from 1 job to another
  • GitLab.com shared Runners are heavy users of this and we need to keep that in mind.
  • It has to be able to provision Windows machine
  • If possible it should be able to provide MacOs machines
Edited by Steve Xuereb