Reopen incident if pipeline retry failed and post notification
What does this MR do and why?
Related:
- Reopen and post note on incidents for which aut... (gitlab-org/quality/engineering-productivity/team#219 - closed)
- Proposal: If incident is automatically closed b... (gitlab-org/quality/engineering-productivity/team#217 - closed)
This change allows gitlab-bot to recognize previous incidents which reported failures for the same pipeline. Therefore, if a pipeline was retried and failed, instead of creating a new incident and fire another retry, the automation will reopen the previous incident (in case it was closed), and leave a comment in the incident to warn team members that a retry has failed.
This will also prompt a Slack notification linking to the previous incident to the #master-broken channel, notifying team members that a pipeline has failed again from a previous incident.
The MR also changes the Slack Notifier to not post to Slack if an incident is automatically closed, and only send notification after retry failed.
Expected impact & dry-runs
These are strongly recommended to assist reviewers and reduce the time to merge your change.
See https://gitlab.com/gitlab-org/quality/triage-ops/-/tree/master/doc/scheduled#testing-policies-with-a-dry-run on how to perform dry-runs for new policies.
See https://gitlab.com/gitlab-org/quality/triage-ops/-/blob/master/doc/reactive/best_practices.md#use-the-sandbox-to-test-new-processors on how to make sure a new processor can be tested.
Action items
-
If adding environment variables for reactive processors, update config/triage-web.yaml
and.gitlab/ci/triage-web.yml
-
(If applicable) Add documentation to the handbook pages for Triage Operations => - (If applicable) Identify the affected groups and how to communicate to them:
-
/cc @ person_or_group
=> -
Relevant Slack channels => -
Engineering week-in-review
-