[meta] Shared Runners transition to GCP plan
Steps that are left:
-
enable monitoring -
gather pre-aggregated metrics on prometheus.gitlab.com (https://dev.gitlab.org/cookbooks/chef-repo/merge_requests/1463, current blocker) -
put graphs at https://performance.gitlab.net/dashboard/db/ci -
separate Consul+Prometheus network traffic form job related traffic -
prepare alerting based on pre-aggregated metrics=> not needed directly for the GCP migration
-
-
prepare monitoring of GCP resources (#2585 (closed)) -
number of used instances (equivalent of what we have already for DO) => https://gitlab.com/gitlab-org/ci-cd/gcp-exporter -
quota usage -
costs usage and prediction=> unfortunately, from what I found in API documentation, we're unable to get such data through API :( -
tag created instances with one of: srm
,gsrm
,gsrm-dev
,stg-srm
(so we can group it later on graphs)
-
-
make GCP runners the default ones -
check and probably request an increase of some quotas -
check current quota usage and estimate needs for migration to GCP -
request quota limits increase (if needed) -
confirm that quota limits were increased -
prepare alerting for GCP quotas
-
-
increase concurrent
andlimit
values in GCP Runners configuration -
decrease concurrent
andlimit
values in DO Runners configuration -
create private-runners-manager-{3,4}.gitlab.com
placed in GCP: #3696 (closed) -
update https://about.gitlab.com/gitlab-com/settings/#shared-runners => www-gitlab-com!10276gitlab-org/gitlab-ce!18209
-
-
disable DO Runners and leave them as a backup: #4172 (closed) -
Change PRM runner managers to group runners on GitLab.com: #4461 (closed) -
Remove Consul cluster for CI Monitoring from GCP part of the CI infrastructure: #4407 (closed) -
Replace own cache servers with GCS for GCP Runners fleet: #4565 (closed)
Edited by Tomasz Maczukin