Multi-regional deployment

We are currently only deploying in a single region (us-central1). GPU availability is generally quite scarce. In order to scale with demand, we should build out capacity to get GPUs in other regions.

In order to do this, we will need:

Dedicated GKE cluster for each target region
Deploy model-gateway and triton to each of those regional clusters
- Figure out if we need per-region Filestore
- Connect monitoring for each one, ensure labelling is distinct
GCP quota increases for each region
Global load balancer targeting each of the regional model gateways

Edited May 31, 2023 by Igor