Incident Manager Expansion
Summary
We are expanding the Incident Manager (previously IMOC) responsibilities in order to provide improved incident response and a more equitable distribution of work among our engineering leaders.
We must have an Incident Manager available to engage in GitLab.com incidents at any time of the day, any day of the year. In order to ensure an Incident Manager can quickly engage, we have established six 4-hour shifts each day. As an Incident Manager you will be assigned to one of these 4-hour shifts and you must be available to perform Incident Manager duties for that shift over a three day period. Incident Managers will have shifts 3 consecutive days in each 12-day cycle so that no one person is over-burdened.
While only showing two shifts, this is a visualization of how the scheduling works:
Not every eligible person will be involved as an Incident Manager at the same time. A rotation of current and new Incident Managers will occur to target each Team Member serving in this role for not longer than 6 months at a time. This may prove more difficult to accomplish within APAC, but we will look to achieve this as best as possible.
Resources
- https://about.gitlab.com/handbook/engineering/infrastructure/incident-manager-onboarding/
- https://about.gitlab.com/handbook/engineering/infrastructure/incident-management/#incident-manager-responsibilities
- https://about.gitlab.com/handbook/engineering/infrastructure/incident-management
When will this start?
The new schedule and set of Incident Managers will start on Tuesday, Oct 12th, 2021 at 1100 UTC.
How will I know what to do?
During the week of Oct 4 there will be 4 AMA sessions (2 EMEA/AMER times + 2 AMER/APAC times). Additionally there will be updates to training, videos, and conversation about the change in #imoc_general.
During the initial two months of this change there will be an additional layer of Incident Managers staffed specifically by Infrastructure leadership who will also join any incident requiring an Incident Manager. So, in any incident you are paged into you'll be joined by three other experienced people (Engineer on call, Infra Incident Manager, and Communications Manager) who also know your role themselves.
Call to Action
While this isn't a volunteer effort, we are interested in giving first selection to those enthusiastic about helping with this effort. If this is you, then do the following:
-
Review the available shifts on this spreadsheet and add your name to one shift you would like to staff. -
Open an Incident Manager Onboarding Issue for yourself -
Add a comment in this issue noting the shift you are selecting and linking to your onboarding issue.
As we fill in the shifts we'll send follow up information, send invites, and otherwise engage in getting everyone ready.
The order in which Incident Managers transition out of this duty, or any other ordering/prioritization decisions, will be based on the create datetime of the onboarding issues.
Thanks in advance for your contributions to this important initiative!
Acknowledgement
Team, please review and familiarize yourself with this, then check your name off once completed.
Secure
-
@twoodham -
@sethgitlab -
@gonzoyumo -
@mark.art -
@nmccorrison -
@thiagocsf -
@theoretick -
@cam_swords -
@fcatteau -
@mikeeddington -
@idawson -
@minac -
@mparuszewski
Enablement
-
@changzhengliu -
@craig-gomes -
@nhxnguyen -
@mendeni -
@dbalexandre -
@mkozono -
@WarheadsSE -
@twk3 -
@DylanGriffith -
@ayufan -
@tkuah -
@yorickpeterse -
@abrandl -
@alexives
Growth and Fulfillment
Ops
-
@cheryl.li -
@erushton -
@fabiopitino -
@grzesiek -
@marknuzzo -
@samdbeckham -
@shampton -
@crystalpoole -
@splattael -
@dcroft -
@michelletorres -
@nicholasklick -
@nicolewilliams