Audit hand-crafted recording rules from the rules directory
In order to completely remove all recording rules from Prometheus and run prometheus-agent instead. We'll also need to move all of our "hand-crafted" (non-jsonnet) recording rules from rules
to thanos-rules
.
A lot of these might not be valid any more, we should go through them and move them to Thanos when needed.
Methodology to identify rules:
Deleted everything yml from https://gitlab.com/gitlab-com/runbooks/-/tree/master/rules?ref_type=heads including the handful of files in https://gitlab.com/gitlab-com/runbooks/-/tree/master/rules/default?ref_type=heads.
Re-ran make generate, and here's the list of prometheus rules files that are not autogenerated. It's long.
The end goal for this issue is for all of these files to be in one of the following states:
- Deleted because unused.
- Alerts-only, moved to mimir.
- Converted to jsonnet.
Alerts-only (handled in #2681 (closed)):
-
airflow.yml -
alertmanager.yml -
chefs.yml -
cloud_nat.yml -
cloud_sql.yml -
cloudflare.yml -
commit-workers.yml -
container.yml -
decomposed-database.yml -
elastic-clusters.yml -
external-dns.yml -
gitlab-com-latencys.yml -
gitlab-com-repositories.yml -
gitlab-com-search.yml -
gitlab_job.yml -
kubernetes-horizontalpodautoscaler.yml -
kubernetes-resources.yml -
kubernetes-storage.yml -
kubernetes-system.yml -
kubernetes.yml -
logging.yml -
mailroom.yml -
omnibus.yml -
osquery.yml -
pages-gitlab-io-status.yml -
patroni.yml -
pgbouncer.yml -
praefect.yml -
prometheus-operator.yml -
pull-mirror-queues.yml -
registry-db.yml -
registry-gc-queues.yml -
remote-mirrors.yml -
sidekiq-queue-latency.yml -
sidekiq-queues.yml
Deleted:
-
generic_process.yml gitlab-com/runbooks!6652 (merged) -
node_components.yml gitlab-com/runbooks!6652 (merged) -
service_component_apdex.yml -
service_saturation.yml -
vault.yml
Audit in progress:
-
blackbox_alerts.yml #2806 (closed) -
consul.yml -
default/incident-project-test.yml gitlab-com/runbooks!7092 (merged) -
default/prometheus-metamons.yml Step 1: gitlab-com/runbooks!7098 (merged) Step 1 take two: gitlab-com/runbooks!7104 (merged). Further discussion in #3419 (closed) -
gitaly.yml gitlab-com/runbooks!7213 (merged) -
gitlab-com-ci.yml gitlab-com/runbooks!7218 (merged) blocked on https://gitlab.com/gitlab-com/gl-infra/scalability/-/issues/3401#note_1858134572 -
gitlab-walg-backups.yml gitlab-com/runbooks!7106 (merged) -
google-cloud.yml gitlab-com/runbooks!7093 (merged) -
haproxy.yml gitlab-com/runbooks!7107 (merged) take two: gitlab-com/runbooks!7122 (merged) -
kubernetes-mixin.yml gitlab-com/runbooks!7134 (merged) -
mtail.yml gitlab-com/runbooks!7094 (merged) -
node.yml gitlab-com/runbooks!7120 (merged) -
patroni-snapshot.yml gitlab-com/runbooks!7108 (merged) -
pipeline_execution.yml gitlab-com/runbooks!7109 (merged) -
postgresql.yml gitlab-com/runbooks!7119 (merged) -
postgresqls.yml production-engineering#25280 -
rails.yml gitlab-com/runbooks!7136 (merged) -
redis.yml gitlab-com/runbooks!7117 (merged) -
service_component_ops_rate.yml gitlab-com/runbooks!7114 (merged) -
sidekiq.yml gitlab-com/runbooks!7165 (merged) -
stackdriver.yml gitlab-com/runbooks!7116 (merged) -
user-auth-events.yml gitlab-com/runbooks!7095 (merged) -
workhorse.yml gitlab-com/runbooks!7115 (merged) -
wiz-runtime-sensor.yml gitlab-com/runbooks!7096 (merged) -
zoekt.yml gitlab-com/runbooks!7096 (merged)