[Cloud Run] OpenTelemetry Collector config changes are not being pushed
Problem
When observability.scrape_targets is set in the runway.yml, it enables a sidecar running otel collector to scrape metrics from the scrape targets and pushes to mimir. The configuration file is generated using a Terraform blob like this:
resource "google_secret_manager_secret_version" "otel_config" {
for_each = local.regions
secret = google_secret_manager_secret.otel_config[each.key].id
secret_data_wo = templatefile("templates/otel-config.yaml.tftpl", {
otel_collector_port = local.otel_collector_port,
scrape_targets = local.scrape_targets,
service_name = var.runway_service_id,
environment = local.env_labels[var.environment].monitoring
region = each.key,
mimir_endpoint = var.mimir_endpoint,
mimir_tenant_username = var.mimir_tenant_id,
mimir_tenant_password = module.mimir_credentials.secret_data["mimir-credentials"].password,
})
secret_data_wo_version = tonumber(module.mimir_credentials.secret_versions["mimir-credentials"])
deletion_policy = "ABANDON"
}
The bug is that this config file will only ever get updated if the mimir credentials change. If a workload is created using a runway.yml file that does not define observability.scrape_targets, then runwayctl will create an otel config file but observability.scrape_targets is undefined, so the config file will not define any scrape targets and even if runway.yml is updated, the config file will not be updated unless the mimir credentials change.
This bug was introduced when we switched to using ephemeral secrets everywhere. Prior to this, any change to the config would trigger a new version of the secret.
Acceptance Criteria
-
Any change (scrape targets, port, etc) to the otel config triggers an update to the config file -
Updating scrape targets (for example) triggers an update to the config file and reflected in the next deploy