[Devops] Fix Monitoring

Description

What are we doing?

The Problem

Why this is Bad?

Background/Current State

Where are we now?

The Problem

Our current monitoring solution is very noisy, and leads to alarm fatigue.

Solution

Update our monitoring mindset to require all alerts require human intervention instead of quietly resolving themselves or becoming larger issues.

Additional Information

Steps

  • TBD
  • TBD
  • TBD
  • TBD