Handle initial Docker Machine certificate generation
Problem
When a deployment for example blue
is no longer needed, we need to delete the VM from GCP for the following reasons:
- cost not to double the runner manager cost
- Help prevent
gitlab-runner
from randomly starting on the deactivated deployment, for example, someone startschef-client
by mistake. - Move towards immutable infrastructure.
As soon as we deploy to a need deployment for example green
the blue
deployment machines need to be destroyed. Below are the blocks to do so.
docker-machine
certificates
The docker+machine
executor uses docker-machine
behind the scenes which creates a CA chain for it to use to talk to the Docker API on the remote machine over TLS. This certificate is only generated when it's not available on disk or expired and this works as expected. The problem is gitlab-runner
can start concurrent executions of docker-machine
on boot for the first time so multiple docker-machine
will try and create the certificates because none of the executions can find the CA chain in the directory. This ends up causing problems with machine creates as we seen with productions incidents like production#4649 (closed) production#1609 (closed).
Proposal
docker-machine
certificates
The best way to fix this is inside of the gitlab-runner
itself and have it check if certificates are valid and available before shelling out to docker-machine
or potentially creating a certificate per machine but there might be a performance hit on this. Product issue is tracked in gitlab-runner
process.