Add a scrapeConfig for stackdriver metrics using GCE discovery

Split from #2680 (closed)

As part of our goal to scrape the same Prometheus Jobs that the VMs do from the Prometheus remote agents. On this particular issue will be focus in on the stackdriver metrics

Original configuration in chef for the job, (see chef role https://gitlab.com/gitlab-com/gl-infra/chef-repo/-/blob/master/roles/gprd-infra-prometheus-server.json)

"stackdriver": {
"static_configs": [
{
"labels": {
"environment": "gprd",
"stage": "main",
"shard": "default",
"tier": "inf",
"type": "monitoring"
},
"targets": [
"cloudsql.googleapis.com/database/cpu",
"cloudsql.googleapis.com/database/disk",
"cloudsql.googleapis.com/database/instance_state",
"cloudsql.googleapis.com/database/memory",
"cloudsql.googleapis.com/database/postgresql/transaction_count",
"cloudsql.googleapis.com/database/state",
"cloudsql.googleapis.com/database/up",
"compute.googleapis.com/firewall",
"compute.googleapis.com/instance/disk/throttled_read_",
"compute.googleapis.com/instance/disk/throttled_write_",
"compute.googleapis.com/instance/integrity",
"compute.googleapis.com/instance/uptime",
"compute.googleapis.com/mirroring",
"compute.googleapis.com/nat",
"container.googleapis.com",
"file.googleapis.com",
"loadbalancing.googleapis.com/https/backend_latencies",
"loadbalancing.googleapis.com/https/backend_request_",
"loadbalancing.googleapis.com/https/backend_response_",
"loadbalancing.googleapis.com/https/request_",
"loadbalancing.googleapis.com/https/response_",
"loadbalancing.googleapis.com/l3/external/egress_",
"loadbalancing.googleapis.com/l3/external/ingress_",
"loadbalancing.googleapis.com/l3/internal/egress_",
"loadbalancing.googleapis.com/l3/internal/ingress_",
"logging.googleapis.com",
"monitoring.googleapis.com",
"networking.googleapis.com/vm_flow/egress_bytes_count",
"networking.googleapis.com/vm_flow/ingress_bytes_count",
"pubsub.googleapis.com/topic/byte_cost",
"pubsub.googleapis.com/topic/send_message_operation_count",
"pubsub.googleapis.com/subscription",
"router.googleapis.com/nat",
"storage.googleapis.com",
"vpn.googleapis.com"
]
}
],
"scrape_interval": "60s",
"scrape_timeout": "45s",
"relabel_configs": [
{
"source_labels": [
"__address__"
],
"target_label": "__param_collect"
},
{
"source_labels": [
"__address__"
],
"target_label": "metric_prefix"
},
{
"replacement": "sd-exporter-01-inf-gprd.c.gitlab-production.internal",
"target_label": "fqdn"
},
{
"replacement": "sd-exporter-01-inf-gprd.c.gitlab-production.internal:9255",
"target_label": "instance"
},
{
"target_label": "__address__",
"replacement": "sd-exporter-01-inf-gprd.c.gitlab-production.internal:9255"
}
]
},

This issue will be about add the relevant ScrapeConfig using the GCE discovery to scrape the stackdriver exporter metrics across all VM's that aren't kubernetes. We might need to ensure that there is a GCE label that covers properly this VMs

~~At the moment, there is not a clear place to write this scrape configuration, I will follow up with a ticket that will allow us to give home to this and other similar configurations.~~

You can add the relevant configuration to the helm chart https://gitlab.com/gitlab-com/gl-infra/charts/-/tree/main/gitlab/prometheus-agent

Edited Jan 09, 2024 by Raúl Naveiras