Skip to content

Problem validation: Slack and incident management workflows

What did we learn?

Results
Slack is seen as a communication and discussion tool for investigating alerts and incidents. We found that people are manually adding decisions and summaries from Slack to their incident issue. However, they are automating, simple repeatable tasks, like resolving and incident from within Slack.
Link to Dovetail project

What’s this issue all about?

We've been building out a complete workflow for incident management that breaks down as follows:

  • Customer sets up alert integrations, which sends all of their alerts to Gitlab
  • Alerts are triaged within GitLab
  • If serious enough, alerts are promoted to an incident
  • Incident is worked on, and closed when it's resolved

But, there are some people (including our SRE team) that aren't utilizing the alerts portion of this workflow. Instead, they use Slack. That workflow looks roughly as follows:

  • Alerts are sent directly to Slack
  • Team reviews them there
  • If they are serious enough, an issue/incident is created to explore further

While we do have a Slack integration - which sends the alert details to Slack, and links the Slack alert to the alert detail page - it's not as robust as some Slack apps like Slackline or Woodhouse, where you can silence alerts, see metrics and logs all within Slack. With those Slack apps, the initial triaging is presumably happening from Slack and then, if it's serious enough, an incident can be created directly from Slack, as well.

Who is the target user of the feature?

What questions are you trying to answer?

  • Are we properly serving the Slack-to-Incident workflow currently, with the existing Slack integration?
  • If not, what features would we need to add to properly support teams working in this way?

Core questions

  • For people who are using incidents but not alerts - how are they currently getting their alerts? Is it in Slack or somewhere else?
  • What information do those Slack alerts contain?
  • What actions can they take on the alerts from Slack?
  • Is there a way to shore up the Slack-to-GitLab incident workflow so that people who are getting alerts in Slack can be more easily funneled into the incident workflow?

Additional questions

It'd be great to get a sense of whether improving our Slack integration for alerts/incidents would help better funnel people into our larger Incident Management workflow.

What hypotheses and/or assumptions do you have?

  • That people not sending alerts to GitLab are primarily sending them to Slack. But, is this true? Is there somewhere else they are getting their alerts instead?
  • That our current Slack integration isn't sufficient for people receiving alerts primarily in Slack. To properly triage alerts in Slack, we would need a proper Slack app for incident management workflow.

What decisions will you make based on the research findings?

Whether or not we should further invest in/improve the Slack integration for incident management.

Plan & End Goal

As part of this issue, we'll tackle the following:

  • Document the current Slack integration functionality for the incident management workflow.
  • Have in-person sessions with both internal and external people to understand what they need from a Slack integration. With external people, we can have them show us their existing Slack workflow for alerts, to better understand what information they need to see in those alerts, and what features/functionalities should be supported as part of that workflow. With the internal people, we could talk to someone from the SRE team, and have them show us exactly what information they receive in Slack from Woodhouse, and what they think a Slack workflow for alerts/incidents should include.
  • End result - a (researched) list of features that a proper Slack integration for incident management would include, and a better idea of what we'd need to build to support that workflow.

Reach

Impact

Confidence

Effort

Links

Dovetail project

Screener doc

Screener preview

Discussion guide

Mural (outlining current Slack integration workflow, as of 2021)

Notes

Edited by Amelia Bauerly