Skip to content

Update the Incident Manager role definition and schedule

Steve Loyd requested to merge IM-schedule-pool into master

Why is this change being made?

The last major change to the Incident Manager role was in Nov 2021. At that point we expanded the membership and refined the responsibilities.

Based on feedback from gitlab-com/gl-infra/mstaff#110 (closed) these additional changes are proposed by this MR:

  • Expanding the schedule to no longer be limited to 4 IMs per shift, but instead to enlist all eligible team members at once. This is intended to significantly reduce the frequently of shifts for each team member.
  • Refinement of the roles participating. Primary impact here is removing Directors+ and Dist Eng+ from the pool of IMs. These roles can still participate at their option.
  • Reducing the shifts per day from six to five. This is accomplished by two shift extending to six hours during the time of day with the least frequent activity.
  • Increasing the number of days in each shift "block" from three to four. This is also intended to reduce frequency for each team member.

Questions asked and answered on Slack:

  • I was curious when this change is going to take effect.
    • the change of two of the shifts from 4 hours to 6 hours takes effect on Jun 15
    • the change to begin adding more IMs to the existing pool (vs. the current per-shift limit of 4) starts on Jun 27, 2022
    • more detail in this issue comment
  • I’ve read the MR and the linked handbook page and I was not able to locate any information about compensation/time in lieu for weekend shifts. Could you please link those as well?
    • thanks for the question on this. There was some discussion on this in the original IM issue back in Oct, happy to augment the onboarding info if you have a suggestion. To summarize, yes the ability to take time in lieu, just by adjusting the adjacent work week around is expected. Shifts during the week are meant to be at least close to your normal work time anyhow (which is why we’re asking you to choose a preferred shift), but for something on the weekend, shifting your week when this occurs is expected.
  • How is the schedule generated? Does it respect the local laws? e.g. my contract says: 40h from Monday to Friday, if I’d (or any German) have shift on Saturdays we would need to adjust the contract (Sundays are even trickier employment law wise).
    • the schedule is generated automatically out of Pagerduty and just follows the simple pattern of rotating everyone through on a 4 day rotation. Similar to what is described here, but with 4 days instead of 3. The increase in days per rotation is based on feedback to have more days at once, which leads to far fewer frequency of the block of shifts. Since PD doesn’t take into account specific individual team member details the way we work around unavailable days is just to do swaps. As long as we’ve been doing this already (with far fewer people) I’m not aware of anyone being unable to get coverage. I’d expect it to be even easier as we’re multiplying the number of people involved.
  • Those of us without a PD account, is there an onboarding process for this? I skimmed the docs linked, but couldn't spot anything

Author Checklist

  • Provided a concise title for this [Merge Request (MR)][mr]
  • Added a description to this MR explaining the reasons for the proposed change, per [say why, not just what][say-why-not-just-what]
    • Copy/paste the Slack conversation to document it for later, or upload screenshots. Verify that no confidential data is added.
  • Assign reviewers for this MR to the correct [Directly Responsible Individual/s (DRI)][dri]
Edited by Steve Loyd

Merge request reports