Reduce the cardinality of GitLab metrics
Prometheus metrics running on production have generated over 50k metrics for each node.
https://gitlab.com/snippets/1692077
We should reduce the number of metrics to preferably be under 15kk on each node. This, however, will reduce the usefulness of metrics as we will have fewer data points from code running on our servers.
Before we actually reduce the metrics cardinality we should:
-
set a limit on the acceptable number of metrics, ~~~right now it's 10k~~ 15k + 5k for future growth. The number should both allow some grow room and do not put a strain on our resources. -
gauge how big of an impact having 50k metrics is. This will help us decide what the expected number of metrics should be -
check usefulness of existing labels. Regardless of the limit, we shouldn't keep useless data
Cardinality for each metric taken from live servers
Edited by Paweł Chojnacki