Staging-ref sentry alerts

Problem

Slack discussion:

@Lucie Zhao We're getting a lot of errors that (without investigation) sound infrastructure-related. It's possible that there are important learnings to be had from these errors, but we aren't guaranteed good ROI. We're already at risk of this stream of info becoming permanently ignored again. Maybe similar in intent to the healthy backlog initiative, we should choose to officially give up on these, or else inject them into refinement or a bug/support triage rotation.

Proposal

From Chloe:

This is what I get from looking at the errors and how often they occur / how they occur. After a couple of times asking the delivery team to fix staging-ref, I got the feeling that the Geo installation there is unstable and can't really be relied strongly upon.From this we have some options:

  1. We restrict the alerts more strongly again by excluding the errors we know are flakes due to the infra (redis, database connections, etc)
  2. We change the alerting system entirely: instead of showing any new error, we get an alert when a new error reaches a certain threashold, for example: 10 errors in an hour (on purpose very low, but still enought to exclude the current noise)
  3. We keep the system as is, but when someone looks at an issue and sees it's obviously infra-related, we mark it with an emoji to show it's been looked at and discarded.

I'm fine with any of those tbh. What do you think? Do you have other ideas?

Edited by Lucie Zhao