Get VM details directly from cloud provider and only cache them locally
The idea of saving the VM metadata locally in a "configuration" file (via https://gitlab.com/gitlab-org/ci-cd/custom-executors/autoscaler/tree/master/vm/file) was borrowed from Docker Machine. It was a good thing for the initial implementation and tests, but we already know it's not the best solution and that it should not be used for production deployments.
Depending only on information stored locally creates a split-brain case. The simplest example is a failure during the prepare
stage, when the VM was already created but it was not yet saved to the file. When the failure is propagated to Runner, Runner fails the job and calls the cleanup
command. Unfortunately, because everything relays on the local configuration file, cleanup
also fails (file ... doesn't exist
), leaving the machine in the cloud until someone will remove it manually, which creates a lot of problems where the cost of such dangling, unused VMs becomes one of the most important ones.
For the MVC implementation we should refactor the code to always fall-back to the cloud provider for getting the machine details. We can use a local file storage for caching, so getting the information will be quicker, but if the information is not present locally, we should query the provider. And in the future, for things like VMs listing we should not use the cache at all, but query the cloud provider using the tags defined by the user in the configuration.