Skip to content

Multi-regional deployment

We are currently only deploying in a single region (us-central1). GPU availability is generally quite scarce. In order to scale with demand, we should build out capacity to get GPUs in other regions.

In order to do this, we will need:

  • Dedicated GKE cluster for each target region
  • Deploy model-gateway and triton to each of those regional clusters
    • Figure out if we need per-region Filestore
    • Connect monitoring for each one, ensure labelling is distinct
  • GCP quota increases for each region
  • Global load balancer targeting each of the regional model gateways
Edited by Igor