Select a different GCP zone when the selected one is out of resources

This MR is an attempt to fix an issue we noticed multiple times over the past days. During some peaks of usage on GCP it seem that the physical resource reach exhaustion, this lead to instance creation error with the following message: ZONE_RESOURCE_POOL_EXHAUSTED

Part of this code was written with the help of Gemini assistant from GCP.

The idea is too extend the list of usable zone, prioritize US as it's closer to gitlab infra. Then try to create instances, if we hit the error try the next zone etc...

Local testing:

uv run ./manage_cloud_runners.py create_runner    
Creating runner test-loic on GCP with additional tags ['', 'cloud', 'google-cloud']
  > Creating new runner in Gitlab
  > Creating new docker runner in Gitlab
  > Creating new server test-loic on GCP
    >> Server flavor = large
    >> Server OS = ubuntu_24_04
Trying zone 'us-central1-a' (region 'us-central1') for machine type 'n4-standard-64'...
Zone 'us-central1-a' exhausted, trying next zone...
Trying zone 'us-central1-b' (region 'us-central1') for machine type 'n4-standard-64'...
Zone 'us-central1-b' exhausted, trying next zone...
Trying zone 'us-central1-c' (region 'us-central1') for machine type 'n4-standard-64'...
Zone 'us-central1-c' exhausted, trying next zone...
Trying zone 'us-central1-f' (region 'us-central1') for machine type 'n4-standard-64'...
Zone 'us-central1-f' exhausted, trying next zone...
Trying zone 'us-east1-b' (region 'us-east1') for machine type 'n4-standard-64'...
Instance 'test-loic' created successfully in zone 'us-east1-b'.

It require an additional python package:

MR here: sylva-projects/sylva-elements/container-images/runner-aas-image!20 (merged)

a consequence is that the creation job may take a bit longer to execute but it will improve stability of our run.

Edited by Loic Nicolle

Merge request reports

Loading