Pilot Weekday Emergency Handling in US Gov

Problem Statement

What is the problem?

At the present we only have a single DRI on-call for daytime shift hours in US government despite having a pool of capable engineers available during weekday working hours.

Why is this a problem?

  • Simultaneous emergencies cannot necessarily be handled by a single on-call DRI
  • Historically US gov team members have been on-call very frequently leading to work/life balance issues

Proposal

  • On weekdays when multiple engineers are available, use a bot to notify the available engineers then an emergency has come in and will need attention from at least one of the folks on shift.
  • On weekends, no change and we still need a single DRI
  • On holidays or low staffed weekdays we should define a process to determine a DRI on-call and override the ops bot

See https://gitlab.com/gitlab-com/support/us-federal/us-government-support-meta/-/issues/2#note_1919163850 for more details

DRI

@JamesRLopes will act as the DRI for this issue.

Required Resources

  • Support operations to implement

Potential Roadblocks/Things to consider

  • There is a potential for the bystander effect to come into play and the emergency not be handled in a timely manner
  • Holidays and other "weekdays" that are low on staff will need to be planned out in advance
  • We should have pre-developed workflows for what to do if/when it reaches the second level of the escalation path in pagerduty and pings the manager on-call who may not be US Gov aligned or a US citizen eligible to assist

Desired Outcome

What does success look like?

During week day business hours the team of available engineers will be notified in the event of an emergency and one or more engineers will address the emergency case in a timely manner.

How do we measure success?

  • Customer emergencies are handled without the need to escalate to the second layer of the escalation policy in PD (manager on-call)

Where would future feedback go?

  • Future feeedback belongs in a new issue

Related Issues/MRs/Epics/Tickets