Add support for using GPUs in Google Compute Engine (!48) · Merge requests · GitLab.org / ci-cd / docker-machine

⚠ ️NOTE THAT THIS FORK IS MAINTAINED FOR CRITICAL BUG FIXES AFFECTING RUNNING COSTS ONLY. NO OTHER CONTRIBUTIONS WILL BE ACCEPTED. ⚠️

What critical bug this MR is fixing?

Machine learning workflows require GPUs or specialized hardware to run in a reasonable amount of time.

How does this change help reduce cost of usage? What scale of cost reduction is it?

Currently on Google Compute Engine with GPUs are quite expensive to run continuously. The smallest instance costs $200+/month, while the more expensive instances cost $2000+/month.

In what scenarios is this change usable with GitLab Runner's docker+machine executor?

Sample call:

docker-machine create --driver google --google-project your-google-project \
  --google-disk-size 50 \
  --google-machine-type n1-standard-1 \ 
  --google-accelerator count=1,type=nvidia-tesla-p4 \
  --google-maintenance-policy TERMINATE \
  --google-machine-image https://www.googleapis.com/compute/v1/projects/deeplearning-platform-release/global/images/family/tf2-ent-2-3-cu110 
  --google-metadata "install-nvidia-driver=True" your-machine-host

This change also requires gitlab-org/gitlab-runner!1955 (merged) to work with GPUs, although this merge request can still be used independently of that. Inside the config.toml:

`runners.docker`

  [runners.docker]
     gpus = ["all"]

`runners.machine`

These are the MachineOptions` that I used:

  [runners.machine]
    MachineOptions = ["google-project=your-google-project", "google-disk-size=50", "google-disk-type=pd-ssd", "google-machine-type=n1-standard-1", "google-use-internal-ip", "engine-registry-mirror=https://mirror.gcr.io", "google-accelerator=count=1,type=nvidia-tesla-p4", "google-maintenance-policy=TERMINATE", "engine-opt=mtu=1460", "engine-opt=ipv6", "engine-opt=fixed-cidr-v6=fc00::/7", "google-machine-image=https://www.googleapis.com/compute/v1/projects/deeplearning-platform-release/global/images/family/tf2-ent-2-3-cu110", "google-metadata=install-nvidia-driver=True"]

Closes to #34 (closed)

Screenshot

Edited Jan 08, 2021 by Stan Hu

Add support for using GPUs in Google Compute Engine

What critical bug this MR is fixing?

How does this change help reduce cost of usage? What scale of cost reduction is it?

In what scenarios is this change usable with GitLab Runner's docker+machine executor?

runners.docker

runners.machine

Screenshot

Merge request reports

`runners.docker`

`runners.machine`