Set up prometheus auto discovery for runner managers
As our new managers will be created dynamically, we need to configure Prometheus' GCP autodiscovery for them.
We can either use the main Prometheus servers that we use now or - which would be ideal but also mean more work - have dedicated Prometheus servers in our GCP project where this configuration would be added.
For the first case infrastructure already exists, but we need to confirm that autodiscovery works and can be used across GCP projects. For the second case we would need to configure new Prometheus node to be part of our Thanos cluster (so that the metrics will be available on our Grafana and in our alerting). We could probably re-use a lot of existing configuration, but it's definitely more work than just prepare new scraping config.
Current TODO checklist:
-
Change CIDRs for bastion-ci
andrunner-managers
subnetworks inci
network ingitlab-ci
project. To not overlap with ourephemeral-runners
CIDRs and none CIDR ingitlab-production/gprd
and its peers (currently these two networks overlap withpeering-gitlab-analysis-gitlab-analysis-vpc
peer ingitlab-production/gprd
which will create a problem there). -
Update documentation: -
Update CIDR details for changed networks in gitlab-com/runbooks!3761 (merged) -
merge the documentation update
-
-
When there will be no conflicts, add peering between the gitlab-production/gprd
andgitlab-ci/ci
networks => https://ops.gitlab.net/gitlab-com/gitlab-com-infrastructure/-/merge_requests/2806 -
Merge Prometheus configuration updates from https://gitlab.com/gitlab-com/gl-infra/chef-repo/-/merge_requests/377 -
Open firewall access from Prometheus server network to new runner managers: https://ops.gitlab.net/gitlab-com/gitlab-com-infrastructure/-/merge_requests/2812 -
Confirm that we can see metrics for new blue-green deployed nodes at thanos.gitlab.net