Skip to content

[EMEA CEOC] Trial some kind of Pagerduty team/group/pair effort for on-call

Request for comments

Need

This is mostly a resurrection of @rvzon's issue Change Pagerduty on-call to team/group effort (#4620 - closed) – this issue was being discussed at the same time as the one that ultimate led to the one-day shifts that are in effect by now.

I think it was a bit tricky to discuss two sweeping changes to how we handle On-Call at the same time, and as such the issue didn't get as much traction as it deserved. At its core, it's about a very different approach/change than changing the duration of shifts, and in fact it is also completely relevant and doable with the new one-day shifts.

Because the core issue remains: We're seeing more days where one person is not able to handle everything that's coming in. Relying on enough people being available to jump in spontaneously is not sustainable if this trend continues.

Approach

For that reason, I'd like to reignite the discussion that Ronald started initially: What could be a good way to have more than just one person On-Call at the same time. I don't have a super specific suggestion yet, so I'll just list a few possible implementations:

  • Two people are On-Call, one being the primary (similar to who is On-Call in the current system), the other the secondary. In case another emergency comes in while one is ongoing already, the secondary would take it. The secondary might also help with note-taking and sending out post-call summaries, which can be really challenging to handle alone on busy days as well.
  • The people are On-Call, both are "a primary". The take turns in handling the incoming emergencies on that day, so the fundamental difference to the first setup is as follows: If one emergency comes in in the morning, is resolved within one hour, and then later a second one comes in in the afternoon, both On-Call people would handle one emergency on that day. In the first setup, the primary On-Call person would handle both.
  • Copying over the alternative that Ronald had listed as well:
    • 1️⃣ Buddy System (Always minimum of 2 SEs are on-call at all times)
      1. Rotate Daily
      2. Rotate weekly.
    • 2️⃣ Primary SE and Secondary SE (Both are on-call but if only 1 emergency comes in Primary takes it)
    • 3️⃣ Round-robin allocation from a group of SEs
    • 4️⃣ Current System (No changes)
    • 5️⃣ Primary SE and a Secondary that will be pinged by the Primary if required.
  • Other ideas? Please comment!

Besides discussing ideas, I'm looking for volunteers – similarly to how two groups trialed the shift length change, maybe we can trial some of these things as well to get a better idea of how their impact. If you're willing to try any of these things with a smaller group of people, add a comment!

Benefit

Having to rely on enough people being around, and knowing how severly it can mess up their plans and time management if a situation like today where you have to drop everything occurs, is not a great feeling. We should strive for having maximum psychological safety, especially for the people we are tasking with handling customers in stressful situations. By designating people to being around for helping with emergencies on any given day, we allow those people to better plan and prepare for it, and acknowledge the increasing demand in the On-Call role.

Competition / Alternatives

I think the only reasonable alternative is to make sure everyone plans with a bit more leeway in their days for emergencies. I'm not convinced that's a great solution, as I think we all have a tendency to maximize our efficiency, which is at odds with leaving too much room for flexible availability. But that might just me extrapolating from myself.

Edited by Manuel Grabowski