Resolve shared-gitlab-org runners capacity problems

shared-gitlab-org runner managers are suffering from the IP_SPACE_EXHAUSTED error. This is because the network that we're currently using is too small for the maximum capacity of jobs we're able to handle there.

The target solution will be to do what's proposed at https://gitlab.com/gitlab-com/gl-infra/reliability/-/issues/14637 - spread the ephemeral VMs across dedicated GCP projects, one per runner manager.

An intermediate solution will be to create three new subnetworks in the gitlab-ci-155816/ci network and reconfigure three of the runner managers pointing each to the dedicated network.

Rollout plan

Add the temporary configuration
- register new CIDRs for the new subnetworks 👉 gitlab-com/runbooks!4865 (merged)
- create new subnetworks 👉 https://ops.gitlab.net/gitlab-com/gl-infra/config-mgmt/-/merge_requests/4035
- reconfigure shared-gitlab-org runner managers 👉 https://gitlab.com/gitlab-com/gl-infra/chef-repo/-/merge_requests/2160
Target solution
- PREPARATION
  - create GCP projects for ephemeral VMs
    - create project definitions 👉 MR_LINK_HERE
    - register CIDRs for shared-gitlab-org runner epehemeral projects 👉 MR_LINK_HERE
    - create four new GCP projects and setup terraform for them 👉 MR_LINK_HERE
    - add ephemeral runners module configuration to all four new projects 👉 MR_LINK_HERE
  - change GCP quotas in the new GCP projects
    - request quota increases for each of the projects
    - confirm that quota limits were increased
  - prepare configuration changes in chef-repo 👉 MR_LINK_HERE
- ROLLOUT
  - merge and apply shared-gitlab-org runner managers reconfiguration from 👉 MR_LINK_HERE

Edited Aug 02, 2022 by Tomasz Maczukin

Assignee Loading

Time tracking Loading