How to page & acknowledge manually created incidents
Problem to be solved
Today, Incident management is set up to trigger escalation policies for new alerts. In this scenario, the on-call responder who is paged can end the paging by acknowledging the alert by changing it's status to from triggered to acknowledged. If the responder changes the status back, we restart the escalation policy and begin paging again.
When a user creates an incident manually, there is no associated alert.
We need to enable paging on incidents and the ability for a responder to "acknowledge" and end paging for a manually created incident AND to "un-acknowledge" or restart paging on an incident for a different escalation policy or user.
Things to figure out
- Today, escalation policies are only triggered be alerts. We need to adapt this to also allow escalation policies be be triggered by Incidents
- Incidents only have two statuses: OPEN or CLOSED - we will need to figure out how to allow a user to acknowledge and un-acknowledge and incident
Intended Users
User Experience
User creates an incident and selects the Escalation Policy or user to trigger paging for that Incident. On-call responder can "acknowledge" an incident to indicate that they are working on the incident and to end paging.
Design
Introduce sidebar items that surface Status and Escalation Policies:
Notes:
- Only
developers and upwill be able to edit the incident status or escalation policy. Reporters and non-project members will have theEditbutton hidden. - Changing the incident status to
acknowledgedorresolvedwill stop paging according to the specified escalation policy. On the other hand, changing the status fromacknowledgedorresolvedtotriggeredwill re-start paging. - If an incident was created from an alert, the alert and incident statuses will be mirrored (so, an ACKed alert will become and ACKed incident).
- For incidents created from alerts - if an escalation policy has been created for the project, the escalation policy will be pre-populated when the incident is created. For manual incidents, the escalation policy needs to be defined manually.
Email to users when paged:
An incident has been triggered in [group/project].
Title: [Insert title here]
Description: [Insert description here]
Escalation policy: [Insert escalation policy, if present]
Metric: [Insert metric, if available]*
[Metric could be a string or a link, up to the discretion of the engineer implementing this issue. Longer-term, we'll likely include a png of the metric but that's out of scope for the first iteration.]
Technical Implementation Plan
-
⚠ Assumption! When escalation policies are changed, existing alerts and incidents will be escalated according to the previous policy. If the status is updated to triggered, the new policy will be applied instead.
The plan below has 3 steps. Part 1 blocks part 2, which blocks part 3. 3A-C can be completed in parallel.
Part 1: Add table/model for IssuableEscalationStatus. backend
Scope:
- Add new table.
- Add new model.
- Add
IssuableEscalationStatushas_oneassociations to issues.
| Column | Required | Type | Description |
|---|---|---|---|
| id | true | Integer | ID of the object |
| issue_id | true | Integer | Incident which has an escalation status |
| escalation_policy_id | false | Integer | Incident which has an escalation status |
| status | Integer | One of AlertManagement::Alert::STATUSES
|
Validations/constraints:
-
issue,statusshould both be present -
escalation_policyshould be in the same project asissue, if present -
statusshould be inAlertManagement::Alert::STATUSES - Unique constraint:
issue_idshould be unique
Part 2: Add escalation support for incidents. backend
Scope:
- Auto-create an
IssuableEscalationStatusfor new incidents without an associated alert. - Escalate incidents w/o alerts based on
IssuableEscalationStatusaccording to escalation policy, per approach in https://gitlab.com/gitlab-org/monitor/monitor/-/issues/56#note_538327473. - Add new email to be sent when paging on incident.
- When the issue-type is changed, delete an existing
IssuableEscalationStatus. - When the incident is moved to another project, set the
escalation_policy_idtonull, reset the status toTriggered.
Blocked by: #323139 (closed)
Part 3A: Add slash command for escalating an incident. backend
Scope:
- Add new slash command
/page <escalation-policy>
Slash command should require an escalation policy from the project as an argument. It should only be available on incidents. It should only be available to developer+. It should not be available on incidents with an associated alert.
If paging has already begun for an escalation policy, reset the status to Triggered, change the escalation_policy_id and re-start escalating on the new policy.
Part 3B: Add Status dropdown in the UI. backend frontend
Scope:
- Dropdown should include same options as the
Alertstatus dropdown. frontend - Expose
IssuableEscalationStatusin GraphQL API. backend
UI should show alert's statuses if the incident is associated with an alert. In this scenario, changing the status should update the status of the alert.
Setting the status of an incident to Triggered should reset the Escalations for the incident.
Part 3C: Add Escalation policy dropdown in the UI. backend frontend
Scope:
- Dropdown should include all the escalation policies available for the project. frontend
- Expose
EscalationPoliciesin GraphQL API. backend
If applicable, UI should show the currently firing escalation policy. Setting the escalation policy should start notifications on that policy for the incident.








