Reduce the cardinality of GitLab metrics

Prometheus metrics running on production have generated over 50k metrics for each node.

https://gitlab.com/snippets/1692077

We should reduce the number of metrics to preferably be under 15kk on each node. This, however, will reduce the usefulness of metrics as we will have fewer data points from code running on our servers.

Before we actually reduce the metrics cardinality we should:

  • set a limit on the acceptable number of metrics, ~~~right now it's 10k~~ 15k + 5k for future growth. The number should both allow some grow room and do not put a strain on our resources.
  • gauge how big of an impact having 50k metrics is. This will help us decide what the expected number of metrics should be
  • check usefulness of existing labels. Regardless of the limit, we shouldn't keep useless data

Cardinality for each metric taken from live servers

/cc @bjk-gitlab @joshlambert

Edited by Paweł Chojnacki