Skip to content

Fix Memory::Watchdog Prometheus gauge labels

Matthias Käppler requested to merge 365950-fix-duplicate-pid-label into master

What does this MR do and why?

Due to a bug in prometheus-client-mmap, we must not set the pid label for an aggregate: :all gauge manually since otherwise it will appear twice in the text output we serve to Prometheus.

The Prometheus client library inserts the pid label for a gauge with all aggregation so that samples are preserved per every process we sample from, but it does not check whether that label is already in the label list. Therefore, it can appear twice.

Since we do not yet have a bug fix in the place for the library, we can easily circumvent this in the application for now by not setting the pid label manually. Note that we still need to set it for the counters since the library does not auto-insert it here.

See also:

NOTE: I did not include a changelog trailer because this feature was introduced in the same milestone and moreover is behind several feature toggles.

Screenshots or screen recordings

Before

grep'ing the /-/metrics endpoint we can see the pid label appears twice for this gauge:

# TYPE gitlab_memwd_heap_frag_limit gauge
gitlab_memwd_heap_frag_limit{pid="puma_0",pid="puma_0"} 0.10000000000000001
gitlab_memwd_heap_frag_limit{pid="puma_1",pid="puma_1"} 0.10000000000000001

This breaks the Prometheus scraper when trying to ingest this:

Screenshot_from_2022-07-20_15-14-41

After

The pid label only appears once now for all metrics:

# HELP gitlab_memwd_heap_frag_limit Multiprocess metric
# TYPE gitlab_memwd_heap_frag_limit gauge
gitlab_memwd_heap_frag_limit{pid="puma_0"} 0.10000000000000001
gitlab_memwd_heap_frag_limit{pid="puma_1"} 0.10000000000000001
# HELP gitlab_memwd_heap_frag_violations_handled_total Multiprocess metric
# TYPE gitlab_memwd_heap_frag_violations_handled_total counter
gitlab_memwd_heap_frag_violations_handled_total{pid="puma_0"} 3
gitlab_memwd_heap_frag_violations_handled_total{pid="puma_1"} 2
# HELP gitlab_memwd_heap_frag_violations_total Multiprocess metric
# TYPE gitlab_memwd_heap_frag_violations_total counter
gitlab_memwd_heap_frag_violations_total{pid="puma_0"} 7
gitlab_memwd_heap_frag_violations_total{pid="puma_1"} 6

The scraper is also happy:

Screenshot_from_2022-07-21_11-40-03

How to set up and validate locally

See above

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Related to #365950 (closed)

Edited by Matthias Käppler

Merge request reports