[Devops] Fix Monitoring
Description
What are we doing?
The Problem
Why this is Bad?
Background/Current State
Where are we now?
The Problem
Our current monitoring solution is very noisy, and leads to alarm fatigue.
Solution
Update our monitoring mindset to require all alerts require human intervention instead of quietly resolving themselves or becoming larger issues.
Additional Information
Steps
-
TBD -
TBD -
TBD -
TBD