Inhibit rule for web service
Problem
In this epic we are proposing to use inhibit rules for certain services, so when service is down and there are upstream services that depend on it, we don't alert on the upstream service, for example, is patroni
service is firing, the web
service shouldn't page the EOC.
Proposal
Inside of the metrics catalog we should define the service dependencies which will automatically create the inhibit rules, for example:
diff --git a/metrics-catalog/services/api.jsonnet b/metrics-catalog/services/api.jsonnet
index 61872989..2e23bfa6 100644
--- a/metrics-catalog/services/api.jsonnet
+++ b/metrics-catalog/services/api.jsonnet
@@ -126,6 +126,16 @@ metricsCatalog.serviceDefinition({
userImpacting: true,
featureCategory: 'not_owned',
team: 'workhorse',
+ dependsOn: [
+ {
+ component: 'rails_requests',
+ type: 'api',
+ },
+ {
+ component: { oneOf: ['rails_primary_sql', 'rails_replica_sql'] },
+ type: 'patroni',
+ },
+ ],
description: |||
Aggregation of most web requests that pass through workhorse, monitored via the HTTP interface.
Excludes health, readiness and liveness requests. Some known slow requests, such as HTTP uploads,
Todo
-
Create DSL for metric catalog 👉 feat: service alert dependencies (gitlab-com/runbooks!4710 - merged) -
Create inhibit rule from that DSL 👉 feat: service alert dependencies (gitlab-com/runbooks!4710 - merged) -
Document new DSL 👉 feat: service alert dependencies (gitlab-com/runbooks!4710 - merged) -
Add validation rule for DSL 👉 feat: service alert dependencies (gitlab-com/runbooks!4710 - merged) -
Have web
depend on thepatroni
service.👉 feat: have web inhibit alerts from patroni (gitlab-com/runbooks!4735 - merged)
Follow up
After this is done, we'll be able to to do this for the following services:
Edited by Steve Xuereb