Implement zero sentry exception policy for staging
Problem
Staging can, and should be used to catch exceptions, and alert us when the rate of error increases in a short period of time. However, there are many causes of errors (many false herrings, or long standing errors).
In gitlab-com/gl-infra/production#5931 (closed), we had coverage issues. This mean that the error in particular NoMethodError was not caught on CI.
Proposal
Iteration 1
Alert on known errors (using Kibana watches)
| Status | Error | Description | Implementation |
|---|---|---|---|
NoMethodError for non-nil objects |
Objects should never hit NoMethodError. Investigate where the error is coming from by checking the Kibana link for both the pubsub-rails-inf-gstg*, and pubsub-sidekiq-inf-gstg* indexes. Open a new issue, or comment on existing issue there is one. |
gitlab-com/runbooks!4134 (merged) | |
NoMethodError for nil objects |
Iteration 2
We assign all other errors by feature category to stage groups - gitlab-com/gl-infra&396 (closed)
Edited by Thong Kuah