Define alerting rules in Thanos for all clusters

Currently there are no alerting rules in the management cluster for workload cluster problems.

Thanos aggregates all metrics from all clusters (management + workload) and is connected to management cluster Alertmanager.

We need to define alerting rules that apply to all clusters for different components:

This is a good candidate for backporting in 1.3 in order to allow proper monitoring and alerting.

Edited Sep 15, 2025 by Alin H