Skip to content

Create low friction, easy, and timely method to post initial status to Status Page for high severity incidents

Expected Outcome

Enable the EOC, Incident Manager, or CMOC a method to easily and quickly update the Status Page to acknowledge that we are aware of and investigating a problem. This is intended for us primarily for significant S1 incidents.

Background

Many significant high severity incidents have these things in common:

  1. they occur suddenly
  2. the impact is severe and widespread
  3. the source is hard to determine quickly as so many things are broken
  4. the signal for the roles on call and responding is overwhelming (from automated alerting to human-generated reporting)

During the initial response the incident response team is rightly focused primarily on identification of the root issue. At the same time, users are simply interested to see acknowledgment that we're aware of the problem and working on it.

We need to be able to easily and quickly acknowledge there is something wrong and that we're looking at it. Further details including more info about what we're investigating, links to incident issue, etc.. can all be updated in the first CMOC update.

Tools

status.io offers an API that allows for automated creation of an incident. They also provide ability to create a template ahead of time. We should use this and integrate it into woodhouse and/or other appropriate tooling so this only takes 5 seconds to accomplish in the initial chaos of an incident.

Suggested Template

  • Title Undetermined Problem - details to follow
  • Current State Investigating
  • Details We have detected a problem related to GitLab.com services and are actively investigating. More details to follow.
  • Incident Status Degraded Performance
  • Broadcast (all)
  • Message Subject Status Notification from GitLab System Status
  • Components API, Git Operations, Container Registry, GitLab Pages, CI/CD*, SAML SSO, Background Processing, Canary