Improve Sentry error visibility and notifications
Background
It was brought up about a month or two ago that there had been a bunch of 500s reported in Sentry, but the group did not have any mechanisms for visibility set up so they were missed and Bad Things Happened.
It would be excellent to decide on error rate thresholds and build a notification process, which could be reporting errors exceeding the thresholds in Slack and/or automating the creation of GitLab bug issues for those errors. This work could be set up and documented in such a way that it could be easily extendable by other sections/groups/teams.
It might also be a useful thing to set up a weekly Triage bot email/issue for these errors, or add a section for errors into the existing weekly email.
TODOs
-
Alex and Tanya to get access to Sentry -
Decide appropriate thresholds -
Decide appropriate notification systems (Slack pings, GitLab bug creation, weekly triage emails, etc) -
Automate those notification systems -
Document everything for reuse by other groups
cc @asoborov, I created this as a start/placeholder