Proposal that all automatically created first-class citizen incidents be also automatically marked as resolved whenever possible
Summary statement
Consider that all automatically created GitLab first-class citizen incidents ought to be also automatically marked as resolved as early possible, based on the resolution of the triggering metric criteria.
Terminology
Here is an example of what I am referring to as a GitLab first-class citizen incident page:
I consider the Summary, Metrics, and Alert details tabs to be distinguishing UI compared to a normal GitLab issue.
Further examples:
Problem to solve
In the above screen capture, 8 out of 10 GitLab incidents were created automatically.
It may very well be the case that all 8 auto-opened incidents are in fact still active, and require a human to triage. However, it is my understanding that such incidents will be left open until a human has had a chance to examine the incident details, make a decision about whether or not to close it, and then manually mark it as /label ~"Incident::Resolved" and also manually close the issue. This can leave such incidents open for days longer than they are relevant, creating very long, very cluttered SRE on-call handover issues, and making it difficult to get a quick sense of what incidents are actually open and relevant.
Proposed solution
This issue proposes that:
- Given that there are metrics which trigger the creation of such an
incident, - It seems reasonable that there should be metrics which would automatically add a
/label ~"Incident::Resolved"comment, or even close the incident.
Further considerations
One caveat to this proposal is that an incident should not be automatically resolved until a human has at least looked at the issue. This requires some way of marking an automatically created incident as having been read. This should be easily done by either: automatically marking the incident as read whenever the incident is requested by a logged-in user, or even a specific user, such as the EOC; or else by requiring that a human manually mark the incident as read, either by adding a /label ~"Incident::Acknowledged" comment, or by clicking an Acknowledge button on the first-class citizen incident page.
Consider this UI component of the GitLab incident page:
Right next to the Close incident button seems like a good place to put an Acknowledge button.
I'd also like to see all such first-class GitLab incidents be announced also in the #production Slack channel, just like pages from PagerDuty, so that the EOC may easily click a Acknowledge button on the Slack post itself, instead of having to click through to the GitLab incident page.
Acceptance criteria
This proposal can be considered to be accepted if it drives to completion and delivery additional features in the GitLab incident user interface and backend such that:
- When specified metrics that trigger a
incidentcreation event also trigger a closing or resolving event triggered when the relevant metrics "un-crosses" the threshold that originally triggered the creation event. - If necessary, an
Acknowledgedbutton is available for a human to press which applies a label or other state mechanism.

