Monitoring improvements focusing on Thanos (!4338) · Merge requests · GitLab.com / Runbooks

This adds a few improvements to the monitoring service:

Remove the public dashboards Thanos component: we don't have that anymore 😢.
Merge all memcached components into one measured from the clientside.
Remove fqdn significant labels for components that now only run on Kubernetes
Tightening up some apdex durations based on the past week of data.
Add some significant labels to the rule evaluations so we can distinguish failures in Prometheus and Thanos on the detail panels
Add real world target durations on GRPC apdexes

I looked into separating out the Thanos service entirely for https://gitlab.com/gitlab-com/gl-infra/infrastructure/-/issues/14335#note_732980227, but that would require reworking some "type" labels on source metrics and I'm not quite sure where all of these live.

This already cleans up the monitoring service a bit so we can move the SLIs when we do split up the monitoring service.

Edited Feb 11, 2022 by Bob Van Landuyt

Monitoring improvements focusing on Thanos