Skip to content

Async communication during S1 incidents

Summary

During severity1 incidents, if gitlab.com is unavailable, we track async communication via a Google doc.

During production#6253 (closed) the document was flooded with editors, this might have been due to it being posted in team-updates, but it might have happened anyway.

Related Incident(s)

Originating issue(s): production#6253 (closed)

Desired Outcome/Acceptance criteria

  • Establish a process for async communication during S1 incidents

Some of the proposals to accomplish that from the Incident Review Meeting (Feb 15, 2022):

  • Woodhouse to open an incident issue on ops when gitlab.com is down ()
  • Restrict edit access on docs.
    • Does this fix the issue?
    • Fun test: post a link to a doc with restricted edit access and ask large group of people to try to view it at once.

: Promising proposal!

Associated Services

Corrective Action Issue Checklist

  • link the incident(s) this corrective action arose out of
  • give context for what problem this corrective action is trying to prevent from re-occurring
  • assign a severity label (this is the highest sev of related incidents, defaults to 'severity::4')
  • assign a priority (this will default to 'priority::4')