Add infrastructure as known transient error
What does this MR do and why?
Add master-brokeninfrastructure as known transient failure so we can retry the job and automatically close incident.
Related: gitlab-org/gitlab#398243 (closed)
I also removed master-brokendependency-upgrade from the trasient error list, given the seg fault
error caused by the last Ruby upgrade should be resolved by gitlab-org/gitlab-build-images!672 (merged). I think that master-brokendependency-upgrade shouldn't always result in transient errors, and if we do end up getting more seg fault
errors, it may not be caused by Ruby upgrade anymore, so we should be careful with labeling these errors from this point on.
Expected impact & dry-runs
These are strongly recommended to assist reviewers and reduce the time to merge your change.
See https://gitlab.com/gitlab-org/quality/triage-ops/-/tree/master/doc/scheduled#testing-policies-with-a-dry-run on how to perform dry-runs for new policies.
See https://gitlab.com/gitlab-org/quality/triage-ops/-/blob/master/doc/reactive/best_practices.md#use-the-sandbox-to-test-new-processors on how to make sure a new processor can be tested.
Action items
-
If adding environment variables for reactive processors, update config/triage-web.yaml
and.gitlab/ci/triage-web.yml
-
(If applicable) Add documentation to the handbook pages for Triage Operations => - (If applicable) Identify the affected groups and how to communicate to them:
-
/cc @ person_or_group
=> -
Relevant Slack channels => -
Engineering week-in-review
-