Skip to content

Retry and auto-close master broken incidents

Jennifer Li requested to merge jennli-triage-master-broken-i2 into master

What does this MR do and why?

This MR implements the following features:

  1. Retries failed job and posts the retry web url to the triage notes
  2. Retries the pipeline if there are at least 10 failed jobs in one incident
  3. Closes the incident if all of the failed jobs are caused by transient failures that are known to us. This includes master-brokendependency-upgrade , master-brokengitlab-com-overloaded , master-brokenfailed-to-pull-image, master-brokenrunner-disk-full

Related issues:

Expected impact & dry-runs

These are strongly recommended to assist reviewers and reduce the time to merge your change.

See https://gitlab.com/gitlab-org/quality/triage-ops/-/tree/master/doc/scheduled#testing-with-a-dry-run on how to perform dry-runs for new policies.

See https://gitlab.com/gitlab-org/quality/triage-ops/-/blob/master/doc/reactive/best_practices.md#use-the-sandbox-to-test-new-processors on how to make sure a new processor can be tested.

Action items

Edited by Jennifer Li

Merge request reports