fix(prdsub): global env labels
Background
Currently we have the following prometheus configuration:
global:
scrape_interval: 15s
scrape_timeout: 10s
evaluation_interval: 15s
external_labels:
cluster: prdsub-customers-gke
env: prdsub
environment: stgsub
monitor: default
prometheus: monitoring/gitlab-monitoring-promethe-prometheus
prometheus_replica: prometheus-gitlab-monitoring-promethe-prometheus-0
provider: gcp
region: us-east1
Notice how env="prdsub"
and environment="prdsub"
. This doesn't
follow our label taxanomy in
https://gitlab.com/gitlab-com/runbooks/-/tree/master/libsonnet/label-taxonomy
where it assumes it's going to be gstg
or gprd
.
This results into thanos-store
and thanos-sidecar
metrics to be
exposed with the wrong labels as we see in gitlab-com/gl-infra&696 (comment 875284133)
Solution
Specify the monitoring_env
explictly for prdsub
so that we set this
to gprd
which checks checked in
values
This was tested in stgsub
first in
!629 (merged)
which worked as expected
ref: https://gitlab.com/gitlab-com/gl-infra/reliability/-/issues/15442
Rollout
We can't simply merge this since we have metamonitoring around the labels as we've learned in gitlab-com/gl-infra/production#6611 (closed)
-
Puase the deadmansnitch with the name Prometheus - GKE prdsub-customers-gke
so we don't page EOC. -
Merge this. -
Update the ALERTMANAGER_SECRETS_FILE
variable in https://ops.gitlab.net/gitlab-com/runbooks/-/settings/ci_cd from{ name: 'prdsub', apiKey: 'xxxx', cluster: 'prdsub-customers-gke'},
to
{ name: 'gprd', apiKey: 'xxxx', cluster: 'prdsub-customers-gke'},
-
Run pipeline on master
branch in https://ops.gitlab.net/gitlab-com/runbooks/-/pipelines/new so we update alert manager configuration -
Confirm alertmanager configuration https://alerts.gitlab.net/#/status is updated like below - receiver: dead_mans_snitch_prdsub_prdsub-customers-gke matchers: - alertname="SnitchHeartBeat" - cluster="prdsub-customers-gke" - env="gprd" continue: false group_wait: 1m group_interval: 5m repeat_interval: 5m
-
Confirm that a heartbeat was sent to deadmansnitch. -
Unpause the deadmansnitch.
Signed-off-by: Steve Azzopardi sazzopardi@gitlab.com