Synchronous test of associating GitLab Alerts to GitLab Incidents
Goal
Dogfood GitLab Alerts to see how GitLab Alerts link GitLab Incidents.
Testing Procedure
- Procure a sample payload of a Pager Duty alert
- Setup PagerDuty alerts in Tanuki via a generic HTTP endpoint
- Reconcile pages coming from PagerDuty and how quickly a user would be able to find an alert in GitLab
- Identify the steps someone on-call would need to associate an alert with a GitLab Incident and what they would need to tie the alert to the page and/or alert that originated outside of GitLab
Current Process
- Someone on-call gets paged and declares an incident -OR- someone declares an incident in Slack -OR- someone opens an incident manually in GitLab
- Alerts live in PagerDuty, Alert Manager, Pingdom or Deadman Snitch and aren't linked to the GitLab Incident, see https://gitlab.com/groups/gitlab-com/gl-infra/-/epics/775+
Background
We met to determine how to best link GitLab Alerts to GitLab Incidents. Here are the links to the agenda and meeting recordings.
- Link to agenda
- Recording Part 1: https://www.youtube.com/watch?v=sh4m3wxVCFI
- Recording Part 2: https://www.youtube.com/watch?v=I1Nq8Qe6yGk
Feature Requests
Opportunities for Alert Improvements
- Create a
GitLab <- Pager Duty
specific integration. All updates from Pager Duty are updated on GitLab, i.e. the status of the alert.- Right now, this isn't possible with a generic HTTP Endpoint integration with PagerDuty. There are multiple webhooks for different events, we saw this when we tested Pager Duty alerts with
@kwanyangu
and@igorwwwwwwwwwwwwwwwwwwww
in November 2021, link to📹 recording, see 26:00-28:00. We will need to tie the events together to make sure updates are made to one alert.
- Right now, this isn't possible with a generic HTTP Endpoint integration with PagerDuty. There are multiple webhooks for different events, we saw this when we tested Pager Duty alerts with
- Alert
status
will need to be updated from the alert to the originating endpoint. i.e. If I update the status on the GitLab alert, the Pager Duty status is also updated. (Future, see comment below)
The opportunities above are pre-requisites for linking alerts to incidents. To link alerts to incidents the following has been proposed:
- Introduce related alerts to incidents (gitlab-org/gitlab#356057 - closed)
- Relate alerts to incidents via a quick action; i.e.
/relate incident_number alert1 alert2
. Could there be autocomplete from the alert title? - Add alert addition to incident timelines.
Dogfooding Steps
-
Schedule a time to test GitLab alerts synchronously -
Setup a project to test alerts, Tanuki Project -
Record the synchronous session and post the recording, 📹 Recording link. Note: The Authorization key was reset after the recording and the video is Private on YouTube. -
Determine if this is something that can be done in Production. Note: not at present, please see the Opportunities for Alert Improvements.