Fix missing labels in pg_stat_activity prometheus metrics
Metric pg_stat_activity_marginalia_sampler_active_count is missing its endpoint_id label for a large majority of its sampled SQL statements. This started on 2025-10-16.
Discovery is mostly in this slack thread. Summarizing:
- @marcogreg discovered this during incident response.
- We traced the problem back to a change in the format of the
correlation_idvalue, which now includes a dash and a suffix (e.g.992a720860faba41-ATL). - The dash character is what broke regexp parsing of that field. And because the marginalia parsing is sensitive to field ordering, all subsequent marginalia fields also now have NULL values:
endpoint_id,database. - To fix this, we simply need to allow dashes in the regexp.
Chef cookbook MR: gitlab-cookbooks/gitlab-exporters!369 (merged)
There may be other places where we parse marginalia.
In particular, @marcogreg noted that the new sidekiq circuit breaker feedback mechanism relies on the same regexp and also needs a similar fix: https://gitlab.com/gitlab-org/gitlab/blob/9e0db2faaf256bf77f70de4d94f0a22081c56c50/lib/gitlab/database/stat_activity_sampler.rb#L27
I'll also check the Kibana logs where we parse samples from pg_stat_activity, since it also performs marginalia field extraction. (It may use a different regexp though.)