Set up prometheus auto discovery for runner managers

As our new managers will be created dynamically, we need to configure Prometheus' GCP autodiscovery for them.

We can either use the main Prometheus servers that we use now or - which would be ideal but also mean more work - have dedicated Prometheus servers in our GCP project where this configuration would be added.

For the first case infrastructure already exists, but we need to confirm that autodiscovery works and can be used across GCP projects. For the second case we would need to configure new Prometheus node to be part of our Thanos cluster (so that the metrics will be available on our Grafana and in our alerting). We could probably re-use a lot of existing configuration, but it's definitely more work than just prepare new scraping config.

Current TODO checklist:

Change CIDRs for bastion-ci and runner-managers subnetworks in ci network in gitlab-ci project. To not overlap with our ephemeral-runners CIDRs and none CIDR in gitlab-production/gprd and its peers (currently these two networks overlap with peering-gitlab-analysis-gitlab-analysis-vpc peer in gitlab-production/gprd which will create a problem there).
Update documentation:
- Update CIDR details for changed networks in gitlab-com/runbooks!3761 (merged)
- merge the documentation update
When there will be no conflicts, add peering between the gitlab-production/gprd and gitlab-ci/ci networks => https://ops.gitlab.net/gitlab-com/gitlab-com-infrastructure/-/merge_requests/2806
Merge Prometheus configuration updates from https://gitlab.com/gitlab-com/gl-infra/chef-repo/-/merge_requests/377
Open firewall access from Prometheus server network to new runner managers: https://ops.gitlab.net/gitlab-com/gitlab-com-infrastructure/-/merge_requests/2812
Confirm that we can see metrics for new blue-green deployed nodes at thanos.gitlab.net

Edited Jul 30, 2021 by Steve Xuereb - Out of Office back 2026-01-05