Incident template for Issues

Problem to solve

Providing a prescriptive, generic incident template gives our users a place to start from when they begin using issues as incidents within GitLab. It demonstrates that they can use templates to customize incidents. It also provides an easy way for them to add labels that they want automatically added to incidents rather than having to manually do it after the fact.

Intended users

Sasha the Software Developer
Devon the DevOps Engineer
Sidney the Systems Administrator

Further details

This work contributes to the Incident Management Vision

Initial proposals

  1. Create a basic Incident issue template that contains the /label incident quick action somewhere in it
  2. Add this template to all customers instances
  3. Make it the default template in the drop-down in Incident Settings
  4. Users can modify the template and even delete it if they want like any other template

Basic template could include:

  • Time/Date
  • Service
  • Threshold exceeded
  • Description
  • /label incident

Design (still TBC)

Template will include the following:

ce_63584-design-incident-template-for-issues-v3

  • Summary - will include all auto-populated content from https://gitlab.com/gitlab-org/gitlab-ee/merge_requests/14614, such as title, start time, description, query, and any annotations. Users with no auto-populated content are provided with guidance for what to add manually.
  • Timeline section: Manually added in by the user.
  • Relevant links: space for users to add in whatever other relevant content that will help in troubleshooting/resolving the issue - links to dashboards, embedded metrics charts, or a specific slack channel where this is being discussed.
  • Similar incidents: gives users the space to highlight similar previous alerts, if/as necessary.
  • Recommended actions section: gives users the space to add in an incident protocol, if they have one.
  • Finally, leaving space for a suggested label and the option to cc team members.

Final formatted issue will look as follows:

Screen_Shot_2019-07-26_at_2.09.19_PM

Permissions and Security

Anyone who can interact with issues and add labels to them

Documentation

Testing

What does success look like, and how can we measure that?

Links / references

Edited by Amelia Bauerly