Setup Meta-Monitoring on the ES production Monitoring cluster
Summary
We recently discovered production#2610 (closed) that we were not sending ELK logs to the monitoring ELK stack. We should have meta-monitoring for the absence of monitoring data..
This issue is concerned with ES logs missing.
Related Incident(s)
Originating issue(s): production#2610 (closed)
Desired Outcome/Acceptance criteria
-
Verify that we are not getting notified when logs from ES cluster are missing. -
Confirm and document the prometheus metric that will be used for alerting. -
Add alerting for the metric in the runbook. -
Verify that we are getting notified when logs from ES cluster are missing.
Associated Services
Corrective Action Issue Checklist
-
link the incident(s) this corrective action arose out of -
give context for what problem this corrective action is trying to prevent from re-occurring -
assign a severity label (this is the highest sev of related incidents, defaults to 'severity::4') -
assign a priority (this will default to 'priority::4')
Edited by Maina Ng'ang'a