Skip to content

[EMEA CEOC] Implement PagerDuty CEOC Secondary Shift

Implement a Primary and Secondary CEOC engineer model in PagerDuty to deal with multiple concurrent emergencies.

The addition here is for the Secondary aspect.

Problem Statement

According to the well summarized points in this issue, we currently always have multiple concurrent emergencies which one CEOC engineer cannot handle.

This is not sustainable for only one CEOC engineer to handle as is. The Manager-on-call has to ping the entire EMEA team to ask for backup which may take a while to get a resource to join the emergency.

What is the problem?

  • Details can be found here.

Why is this a problem?

  • Points mentioned here.

Proposal

1.) This Request for Change proposes adding a Secondary EMEA CEOC engineer to help the Primary if more than one emergency comes in. The conditions to ping the Secondary can be automated on PagerDuty as "When Primary does not acknowledge in x time" or manually done by the Primary( orManager-on-call) if they are in another emergency.

NB: Other implementations were suggested here and can be considered in future iterations.

DRI

PERSON will act as the DRI for this issue.

Required Resources

  • This will require support-ops to accomplish cc @gitlab-com/support/support-ops
  • This will require approval from @vparsons

Potential Roadblocks/Things to consider

We cannot predict how many emergencies will be coming in every time, but one secondary engineer seemed like a good place to start.

Desired Outcome

What does success look like?

How do we measure success?

Where would future feedback go?

Related Issues/MRs/Epics/Tickets

  • gitlab-com/support/support-team-meta#6240 - "Iterate on PagerDuty CEOC on-call to better deal with multiple concurrent emergencies (EMEA)"
  • #4877 (closed) - "[EMEA CEOC] Trial some kind of Pagerduty team/group/pair effort for on-call"
  • #6052 (closed) - "Split EMEA CEOC Shift from 8 to 4-hour work hours"
Edited by Ilia Kosenko