Add support for using GPUs in Google Compute Engine
What critical bug this MR is fixing?
Machine learning workflows require GPUs or specialized hardware to run in a reasonable amount of time.
How does this change help reduce cost of usage? What scale of cost reduction is it?
Currently on Google Compute Engine with GPUs are quite expensive to run continuously. The smallest instance costs $200+/month, while the more expensive instances cost $2000+/month.
In what scenarios is this change usable with GitLab Runner's docker+machine executor?
Sample call:
docker-machine create --driver google --google-project your-google-project \
--google-disk-size 50 \
--google-machine-type n1-standard-1 \
--google-accelerator count=1,type=nvidia-tesla-p4 \
--google-maintenance-policy TERMINATE \
--google-machine-image https://www.googleapis.com/compute/v1/projects/deeplearning-platform-release/global/images/family/tf2-ent-2-3-cu110
--google-metadata "install-nvidia-driver=True" your-machine-host
This change also requires gitlab-org/gitlab-runner!1955 (merged) to work with GPUs, although this merge request can still be used independently of that. Inside the config.toml
:
runners.docker
[runners.docker]
gpus = ["all"]
runners.machine
These are the MachineOptions` that I used:
[runners.machine]
MachineOptions = ["google-project=your-google-project", "google-disk-size=50", "google-disk-type=pd-ssd", "google-machine-type=n1-standard-1", "google-use-internal-ip", "engine-registry-mirror=https://mirror.gcr.io", "google-accelerator=count=1,type=nvidia-tesla-p4", "google-maintenance-policy=TERMINATE", "engine-opt=mtu=1460", "engine-opt=ipv6", "engine-opt=fixed-cidr-v6=fc00::/7", "google-machine-image=https://www.googleapis.com/compute/v1/projects/deeplearning-platform-release/global/images/family/tf2-ent-2-3-cu110", "google-metadata=install-nvidia-driver=True"]
Closes to #34 (closed)
Screenshot
Edited by Stan Hu