Skip to content

Fix query to get release status metric values

Jenny Kim requested to merge jennykim/metric-updater-query into master

What does this MR do and why?

The prometheus query queries a bit too further back than I expected, so it's getting the max value of the metric, like

  • Monthly release: We had another bug earlier this month when we accidentally created a rc_tagged status release metric for the current release, when we actually tagged an earlier release RC. Although the metric was corrected, history says that the max value was set to 3 (rc_tagged)
  • Patch release: We already had a patch release earlier this month for the same versions, so since the time range of the query looks back that far, it fetched the maximum value, which is 3.

Which resulted in us setting the metrics with value of 3 instead of refreshing the value with the expected/latest value of 1 (example pipeline job output).

Instead, we should just get the latest value by querying with last_over_time, like:

last_over_time(delivery_release_monthly_status{version=<version>}[1h])

The scheduled job to refresh the metrics runs every 10 minutes, so 1h should be sufficient to get the last emitted metric, even if the prometheus/grafana pods restart and flush the metrics.

Feature flag for updating metrics is now: release_status_metric_update (to keep it separate from release_status_metric FF that creates metrics)

Addresses: gitlab-com/gl-infra/delivery#20181 (closed)

Edited by Jenny Kim

Merge request reports