Skip to content

Monitor the Monitor group's demo environments

Allison Browne requested to merge ab-monitor-demo-environments into master

What does this MR do?

Overview

The monitor team would like to be alerted when prometheus stops reporting metrics in our demo environments. For now this code will just run on 2-3 projects on .com and staging, but in the future it may run on more projects.

Part of: gitlab-org/monitor/general#58

Technical implementation

New columns in the cluster_applications_prometheus table named healthy a

Scheduled job that runs every x minutes and hits health and checks via prometheus.

This job stores a boolean in true if healthy, false if not.

Sends an alert to our generic alert endpoint if healthy=true changes to healthy=false but not if the column already had that value.

This is just the service, the worker will come later.

Screenshots

Screen_Shot_2020-03-13_at_10.15.53_AM

Does this MR meet the acceptance criteria?

Conformity

Availability and Testing

Security

If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:

  • [-] Label as security and @ mention @gitlab-com/gl-security/appsec
  • [-] The MR includes necessary changes to maintain consistency between UI, API, email, or other methods
  • [-] Security reports checked/validated by a reviewer from the AppSec team
Edited by Allison Browne

Merge request reports