Monitoring improvements focusing on Thanos
This adds a few improvements to the monitoring service:
- Remove the public dashboards Thanos component: we don't have that
anymore
😢 . - Merge all memcached components into one measured from the clientside.
- Remove fqdn significant labels for components that now only run on Kubernetes
- Tightening up some apdex durations based on the past week of data.
- Add some significant labels to the rule evaluations so we can distinguish failures in Prometheus and Thanos on the detail panels
- Add real world target durations on GRPC apdexes
I looked into separating out the Thanos service entirely for https://gitlab.com/gitlab-com/gl-infra/infrastructure/-/issues/14335#note_732980227, but that would require reworking some "type" labels on source metrics and I'm not quite sure where all of these live.
This already cleans up the monitoring service a bit so we can move the SLIs when we do split up the monitoring service.
Edited by Bob Van Landuyt