Create VMs in background to speed-up the autoscaling

For the initial implementation of autoscaler we've decided to go with the simplest way: create the VM in context of prepare command call. While this allowed us to move forward and prepare a working test environment for ~"Shared Runners::Windows" (which in few days will bring us an open beta tests program on GitLab.com), it's not ideal from the users perspective.

With current implementation each job is longer for the time required to spin-up the VM that will handle it. While in our existing Shared Runners (powered by the docker+machine executor) this time is mostly invisible for the user, because Runner creates VMs in advance and assigns a first free one from the pool for a newly started job. The user experience difference is that with our current Windows Shared Runners configuration we should expect job queue timings similar to what we can see for the Linux Shared Runners only when a big load is given on GitLab.com's CI.

When discussing this in the past the idea that we had to improve this was following:

Implement a daemon mode in autoscaler. With this we would start autoscaler as a separate, long-living process in exactly same way how GitLab Runner is started.
We should add configuration options similar to what we have in docker+machine executor, so: number of idle VMs, maximum number of VMs and maximum number of jobs that a VM can handle before being removed. For the first iteration I think we can skip supporting OffPeak versions.

Together, these three options would be responsible for creating VMs in background. Managing that the idle number of machines are up and ready, managing the lifetime of the VMs and assigning free VMs when requested would be the task of the daemon mode.
When autoscaling is enabled, we should change the behavior of prepare, run and cleanup commands of autoscaler. Instead of creating/removing the VM directly they should:
1. Check if autoscaler's daemon is available.
2. Connect to it and request a VM lock (for prepare). If there will be a ready VM, autoscaler's daemon should chose one, block it for the usage of the given job and return immediately. If not, then it should schedule a creation which of course becomes a blocking operation (so behaves as autoscaler works now).
3. Connect to it and get the VM connection details (for run). Execute the job on the VM as it's done now.
4. Connect to it and requests a VM release (for cleanup). Depending on autoscaling configuration this would either release the VM back to the pool of free VMs or trigger a VM removal. Anyway, cleanup returns immediately, and the removal/release happens in the background.
To communicate between autoscaler's commands and the atuoscaler's daemon we should use something like gRPC.

With the above we would decouple the VMs management from job execution as much as possible. In fact, in a slightly different way it would replicate exactly how we work now with our Linux Shared Runners 🙂

Queue theory

Video resources:

Books/Written materal:

Edited Dec 01, 2020 by Steve Xuereb

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information