Skip to content

Handle initial Docker Machine certificate generation

Problem

When a deployment for example blue is no longer needed, we need to delete the VM from GCP for the following reasons:

  1. cost not to double the runner manager cost
  2. Help prevent gitlab-runner from randomly starting on the deactivated deployment, for example, someone starts chef-client by mistake.
  3. Move towards immutable infrastructure.

As soon as we deploy to a need deployment for example green the blue deployment machines need to be destroyed. Below are the blocks to do so.

docker-machine certificates

The docker+machine executor uses docker-machine behind the scenes which creates a CA chain for it to use to talk to the Docker API on the remote machine over TLS. This certificate is only generated when it's not available on disk or expired and this works as expected. The problem is gitlab-runner can start concurrent executions of docker-machine on boot for the first time so multiple docker-machine will try and create the certificates because none of the executions can find the CA chain in the directory. This ends up causing problems with machine creates as we seen with productions incidents like production#4649 (closed) production#1609 (closed).

Proposal

docker-machine certificates

The best way to fix this is inside of the gitlab-runner itself and have it check if certificates are valid and available before shelling out to docker-machine or potentially creating a certificate per machine but there might be a performance hit on this. Product issue is tracked in 👉 gitlab-org/gitlab-runner#3676 (closed). Another way to fix this, which might be faster to do so is to update the cookbook to create a certificate before starting the gitlab-runner process.

Edited by Tomasz Maczukin