Async communication during S1 incidents
Summary
During severity1 incidents, if gitlab.com is unavailable, we track async communication via a Google doc.
During production#6253 (closed) the document was flooded with editors, this might have been due to it being posted in team-updates, but it might have happened anyway.
Related Incident(s)
Originating issue(s): production#6253 (closed)
Desired Outcome/Acceptance criteria
-
Establish a process for async communication during S1 incidents
Some of the proposals to accomplish that from the Incident Review Meeting (Feb 15, 2022):
-
Woodhouse to open an incident issue on ops when gitlab.com is down ( ⭐ ) -
Restrict edit access on docs. -
Does this fix the issue? -
Fun test: post a link to a doc with restricted edit access and ask large group of people to try to view it at once.
-
Associated Services
Corrective Action Issue Checklist
-
link the incident(s) this corrective action arose out of -
give context for what problem this corrective action is trying to prevent from re-occurring -
assign a severity label (this is the highest sev of related incidents, defaults to 'severity::4') -
assign a priority (this will default to 'priority::4')