Increase visibility/awareness for HTTP400 errors
Problem Statement
During recent incident: production#5194 (closed) we suffered a bad situation where the application was throwing HTTP400's but there was no indicator to the EOC that anything was wrong. We need to determine how to monitor for such a situation and alert the EOC.
Points of Interest:
- Metrics during this time showed nothing exciting: https://dashboards.gitlab.net/d/api-main/api-overview?orgId=1&from=1626818835299&to=1626833235675&var-PROMETHEUS_DS=Global&var-environment=gprd&var-stage=main
- We relied on logs to assist in us determine what has gone wrong: https://log.gprd.gitlab.net/goto/d662a665b88572195f55fdd7c199961d
Solution Statement
- ...