Skip to content

Fix race conditions with binfmt and GPU Docker systemd plug-ins

Stan Hu requested to merge sh-fix-systemd-race-conditions into master

Docker Machine provisions a runner with cloud-init, which previously added systemd drop-ins for docker.service to initialize binfmt for QEMU and install the GPU driver, if needed.

As seen in gitlab-com/gl-infra/production#17835 (closed), we can run into a race condition where docker.service gets enabled before cloud-init gets a chance to write those files. As a result, those drop-ins don't get run without calling systemctl reload daemon-service and systemctl restart docker.service.

We don't want to call systemctl restart docker.service in cloud-init because that could abruptly terminate the Docker connection, as we saw happen in gitlab-org/ci-cd/shared-runners/infrastructure#203 (closed). Instead, convert the systemd drop-ins to full-fledged services that are manually started and run after docker.service.

Relates to gitlab-com/gl-infra/production#17835 (closed)

Merge request reports