Skip to content

Add support for using GPUs in Google Compute Engine

NOTE THAT THIS FORK IS MAINTAINED FOR CRITICAL BUG FIXES AFFECTING RUNNING COSTS ONLY. NO OTHER CONTRIBUTIONS WILL BE ACCEPTED.

What critical bug this MR is fixing?

Machine learning workflows require GPUs or specialized hardware to run in a reasonable amount of time.

How does this change help reduce cost of usage? What scale of cost reduction is it?

Currently on Google Compute Engine with GPUs are quite expensive to run continuously. The smallest instance costs $200+/month, while the more expensive instances cost $2000+/month.

  1. https://cloud.google.com/compute/gpus-pricing
  2. https://cloud.google.com/compute/docs/gpus/

In what scenarios is this change usable with GitLab Runner's docker+machine executor?

Sample call:

docker-machine create --driver google --google-project your-google-project \
  --google-disk-size 50 \
  --google-machine-type n1-standard-1 \ 
  --google-accelerator count=1,type=nvidia-tesla-p4 \
  --google-maintenance-policy TERMINATE \
  --google-machine-image https://www.googleapis.com/compute/v1/projects/deeplearning-platform-release/global/images/family/tf2-ent-2-3-cu110 
  --google-metadata "install-nvidia-driver=True" your-machine-host

This change also requires gitlab-org/gitlab-runner!1955 (merged) to work with GPUs, although this merge request can still be used independently of that. Inside the config.toml:

runners.docker

  [runners.docker]
     gpus = ["all"]

runners.machine

These are the MachineOptions` that I used:

  [runners.machine]
    MachineOptions = ["google-project=your-google-project", "google-disk-size=50", "google-disk-type=pd-ssd", "google-machine-type=n1-standard-1", "google-use-internal-ip", "engine-registry-mirror=https://mirror.gcr.io", "google-accelerator=count=1,type=nvidia-tesla-p4", "google-maintenance-policy=TERMINATE", "engine-opt=mtu=1460", "engine-opt=ipv6", "engine-opt=fixed-cidr-v6=fc00::/7", "google-machine-image=https://www.googleapis.com/compute/v1/projects/deeplearning-platform-release/global/images/family/tf2-ent-2-3-cu110", "google-metadata=install-nvidia-driver=True"]

Closes to #34 (closed)

Screenshot

image

Edited by Stan Hu

Merge request reports

Loading