Skip to content

Duo / AI Powered Tier 2 SME Escalations

Tier 2 SME Group Onboarding for Incident Response

Summary

  • Group Name: Duo / AI
  • Group Manager / DRI: @wortschi
  • Slack Channel: #g_ai_framework

Tier-2 SME on-call ( Level 1 )

Tier 2 rotations are for subject matter experts. On average, they should know more about their subject matter than engineers outside of the group.

Tier-2 SME on-call (Level 1) Members

  • Vitali Tatarintev
  • Omar Qunsul
  • Igor Drozdov
  • Bruno Cardoso
  • Manoj Memana Jayakumar
  • Mohamed Hamda
  • Patrick Cyiza
  • Eva Kadlecová
  • Dillon Wheeler
  • Tetiana Chupryna
  • Mikolaj Wawrzyniak
  • Halil Coban
  • Joey Khabie
  • Andras Herczeg
  • Alper Akgun
  • Fred de Gier
  • Erran Carey
  • Thomas Vik
  • Alexander Chueshev
  • Dmitry Gruzd
  • Denys Mishunov
  • Shekhar Patnaik
  • Allen Cook
  • Cindy Halim
  • Jeff Park
  • John Slaughter
  • Alejandro Rodríguez Pineda
  • Nathan Weinshenker
  • Michael Thomas
  • Jessie Young
  • James Fargher
  • Pam Claudine Artiaga
  • Shinya Maeda
  • Mark Chao
  • Surabhi Suman
  • Tan Le
  • Mark Lapierre

Tier-2 SME on-call (Level 1) Details

All times are in UTC

Name: EMEA

  • Times: M T W T F : 07:00 - 15:00
  • Handover Time: 07:00
  • Change Shifts: Weekly, Monday 07:00
  • Members:
    • Vitali Tatarintev
    • Omar Qunsul
    • Igor Drozdov
    • Bruno Cardoso
    • Manoj Memana Jayakumar
    • Mohamed Hamda
    • Patrick Cyiza
    • Eva Kadlecová
    • Dillon Wheeler
    • Tetiana Chupryna
    • Mikolaj Wawrzyniak
    • Halil Coban
    • Joey Khabie
    • Andras Herczeg
    • Alper Akgun
    • Fred de Gier
    • Erran Carey
    • Thomas Vik
    • Alexander Chueshev
    • Dmitry Gruzd
    • Denys Mishunov
    • Shekhar Patnaik
    • Surabhi Suman

Name: AMER

  • Times: M T W T F : 15:00 - 23:00
  • Handover Time: 15:00
  • Change Shifts: Weekly, Monday 07:00
  • Members:
    • Allen Cook
    • Cindy Halim
    • Jeff Park
    • John Slaughter
    • Alejandro Rodríguez Pineda
    • Nathan Weinshenker
    • Michael Thomas
    • Jessie Young

Name: APAC

  • Times: M T W T F : 23:00 - 07:00
  • Handover Time: 23:00
  • Change Shifts: Weekly, Monday 23:00
  • Members:
    • James Fargher
    • Pam Claudine Artiaga
    • Shinya Maeda
    • Mark Chao
    • Tan Le
    • Mark Lapierre

Tier-2 SME Escalation on-call ( Level 2 )

Folks in this level will be paged when the initial page to Level 1 is not acknowledged within 15 minutes

  • Choose one:
    • Round-robin all team members after 15 minutes (No need to fill the Level 2 template below)
    • New group of a small amount of folks
    • Shared escalation schedule consisting of rotation leaders
      • Fill the current onboarding issue with the Escalation oncall schedule if it is not defined yet and remove the below schedule template
      • Link to the schedule in incident.io where this Escalation Schedule is defined : schedule here

Tier-2 SME Escalation oncall (Level 2) Members

  • Member 1
  • Member 2

Tier-2 SME Escalation oncall (Level 2) Schedule Details

Name: EMEA

  • Times: M T W T F S S: 07:00 - 15:00
  • Handover Time: 07:00
  • Change Shifts: Weekly, Monday 07:00
  • Concurrent shifts: 1
  • Members:
    • Member 1
    • Member 2

Name: AMER

  • Times: M T W T F S S: 15:00 - 23:00
  • Handover Time: 15:00
  • Change Shifts: Weekly, Monday 07:00
  • Concurrent shifts: 1
  • Members:
    • Member 3

Name: APAC

  • Times: M T W T F S S: 23:00 - 07:00
  • Handover Time: 23:00
  • Change Shifts: Weekly, Monday 23:00
  • Concurrent shifts: 1
  • Members:
    • Member 4

Escalation Path

flowchart TD
%% Nodes
    priority("Priority")
    channel-low("Notify Slack Channel")
    channel-high("Notify Slack Channel")
    notify-high("Notify User<br>All at once")
    notify-low("Notify User<br>Round Robin (2m)")
    retry-high("Retry if not ack'd")
    retry-low("Retry if not ack'd")

%% Edge connections between nodes
    priority -->|high| channel-high
    priority -->|low| channel-low
    channel-high -->|5m| notify-high
    channel-low -->|15m| notify-low
    notify-high -->|5m| retry-high
    notify-low -->|10m| retry-low

The default escalation path can be changed. Time intervals can be adjusted, and notification options are not fixed. If unsure, the defaults should be a reasonable starting point.

DRI Checklist

  • Finalize On-call team members for each level

    • Fill the schedule section above in this issue.

    • If any of the members are part of the Incident Manager on-call rotation, please create an issue like this example here to have them removed (where possible) from the IM rotation.

    • Decide on an escalation option to move forward with the Level 2 in the escalation chain

  • Oncall license Setup and access

    • Use Slack command /request to raise a request in Lumos for yourself to get Admin access to be able to set the rotation on incident.io
    • Ensure each team member has Full access in the "on-call seat" column on the incident.io users page, verify here. If not request the Ops team to provide it for any team members who need it by pinging a member of the Ops team on the issue. DO NOT USE THE ACSESS REQUEST TEMPLATE process for this. This is not granting permission, this is granting a full access license (for billing purposes) for that user to use the on-call features.
  • Setup Schedules and Escalation path

    • Once the schedule section above in this issue is filled create a Schedule for your team using incident.io. To do so you can duplicate SAMPLE tier2 - TEAMNAME schedule and edit it as per your requirements, add the members accordingly. For the schedule name, use the format tier2 - <team name>. This is your SME on-call schedule (Level 1)

    • Setup SME escalation oncall (Level 2) and escalation path based on option chosen

Level 2 option chosen: Round-robin
  • Navigate to Escalation paths in incident.io UI, duplicate tier2 - team_name - Round_robin escalation path for reference and edit it as per your requirements, ensure to add your Schedules to the Escalation path. For the Escalation Path name use the format tier2 - <team name>
Level 2 option chosen: New small group of folks / Shared escalation schedule consisting of rotation leaders
  • Create a new Schedule for the SME escalation on-call (Level 2) based on the schedule information
  • Navigate to Escalation paths in incident.io UI, duplicate tier2 team_name - Small_group / Shared rotation leader escalation path for reference and edit it as per your requirements, ensure to add your Schedules to the Escalation path. Add the SME on-call schedule in Level 1, add the Escalation SME on-call schedule in Level 2. For the Escalation Path name use the format tier2 - <team name>

Note: In the Escalation Path flow-chart, Notify represents Paging the folks on the schedule, Notify on Slack Channel will simply notify them on Slack

  • Prepare team for On-call

  • Go live!

    • On the due date update the On-call teams catalog with the name (tier2 - <team name>) and escalation path of your team. Each row in the catalog helps populate the drop-down menu that EOCs will use to select to page the required team.

    • Announce in the #eoc-general that your team is ready to be paged, give a high-level description for this SME group's covered areas and this handbook link

Congratulations you are now ready to be on-call !

Edited by Shreya Shah