Consider communication collection methods during incidents
Problem Statement
In the most recent RCA: gitlab-com/gl-infra/production#2203 (closed)
It was discussed that some of the participating members from dev escalation faced problems ensuring a clean hand off of communications between members. This problem is not specific to dev-escalation members, but can also be seen during incidents for Infrastructure teams as well. Specifically the following:
- There are multiple areas that needed to be followed, many slack threads, comments in issues, zoom calls
- Some conversation threads were started in other channels outside of #incident-management and #dev-escalation
- Some communication was held within an identified MR or associated issue vs the incident issue
All of the above provided a lot of work for the next member to:
- Read all threads that are known
- Somehow learn what was discussed during a Zoom call
- Discover what troubleshooting methods are known or have been attempted
- Compare notes where information may be missing or obsolete as new information was discovered
The issues discussed above have all provided an additional amount of work on top of the on-going situation that required the assistance of various team members across various teams and functions.
Utilize this issue to discuss what we can do to update our communication standards and training to ensure we eliminate a communication as a bottle neck when attempting to work together on a high profile issue. Consider adopting similar policies for other teams as well where this is a common problem.