Proposal: Assess a 6h/4shifts SRE OnCall schedule

Background

SRE OnCall shifts are currently 8 hours, with 3 possible windows each day.

Recently, during a poll proposing to shift the current 8hours schedule window to different timings, the collaborative feedback from everyone led to discussing different schedule settings. Mainly around 4hours and 6hours shifts.

@dawsmith have created a nice visual of these two proposals which can be found here.

Problem

While the current schedule have been working well and most people have adapted to it, it falls short when people need to shift their working hours slightly, or have a sudden reason to look for a coverage.

Also having 8 hours shifts could sometimes mean that when an incident spells over to the next shift, or when urgent follow-up incident-work needs to be done, EOC could end up spending a bit more than just 8h a day working, which, for a week's time, is exhausting.

Proposal

Create shorter and more flexible shifts to accommodate different schedules and make OnCall less stressful, while keeping handover frequency to a reasonable number.

Blocker

Without enough engineers in the rotation, we won't be able to go with a shorter shift. In the current setup, we have 3 engineers booked for OnCall every week from all regions, with the aim of having at least 6 engineers in one region (6x3 = 18 engineers), meaning that an engineer would be expected to be OnCall every 6 weeks on average.

We should aim for having a minimum of 24 engineers (8 per region) before experimenting with this proposal, to keep the same project/incident load.

Next steps

Experiment with creating dummy OnCall schedules in PD, and see if we can reach schedule windows that work for everyone.


🗯 watercooler corner

Edited by Rehab