Skip to content

Fix missing labels in pg_stat_activity prometheus metrics

Metric pg_stat_activity_marginalia_sampler_active_count is missing its endpoint_id label for a large majority of its sampled SQL statements. This started on 2025-10-16.

Discovery is mostly in this slack thread. Summarizing:

  • @marcogreg discovered this during incident response.
  • We traced the problem back to a change in the format of the correlation_id value, which now includes a dash and a suffix (e.g. 992a720860faba41-ATL).
  • The dash character is what broke regexp parsing of that field. And because the marginalia parsing is sensitive to field ordering, all subsequent marginalia fields also now have NULL values: endpoint_id, database.
  • To fix this, we simply need to allow dashes in the regexp.

Chef cookbook MR: gitlab-cookbooks/gitlab-exporters!369 (merged)

There may be other places where we parse marginalia.

In particular, @marcogreg noted that the new sidekiq circuit breaker feedback mechanism relies on the same regexp and also needs a similar fix: https://gitlab.com/gitlab-org/gitlab/blob/9e0db2faaf256bf77f70de4d94f0a22081c56c50/lib/gitlab/database/stat_activity_sampler.rb#L27

I'll also check the Kibana logs where we parse samples from pg_stat_activity, since it also performs marginalia field extraction. (It may use a different regexp though.)