Duo / AI Powered Tier 2 SME Escalations
Tier 2 SME Group Onboarding for Incident Response
Summary
- Group Name: Duo / AI
- Group Manager / DRI: @wortschi
- Slack Channel: #g_ai_framework
Tier-2 SME on-call ( Level 1 )
Tier 2 rotations are for subject matter experts. On average, they should know more about their subject matter than engineers outside of the group.
Tier-2 SME on-call (Level 1) Members
- Vitali Tatarintev
- Omar Qunsul
- Igor Drozdov
- Bruno Cardoso
- Manoj Memana Jayakumar
- Mohamed Hamda
- Patrick Cyiza
- Eva Kadlecová
- Dillon Wheeler
- Tetiana Chupryna
- Mikolaj Wawrzyniak
- Halil Coban
- Joey Khabie
- Andras Herczeg
- Alper Akgun
- Fred de Gier
- Erran Carey
- Thomas Vik
- Alexander Chueshev
- Dmitry Gruzd
- Denys Mishunov
- Shekhar Patnaik
- Allen Cook
- Cindy Halim
- Jeff Park
- John Slaughter
- Alejandro Rodríguez Pineda
- Nathan Weinshenker
- Michael Thomas
- Jessie Young
- James Fargher
- Pam Claudine Artiaga
- Shinya Maeda
- Mark Chao
- Surabhi Suman
- Tan Le
- Mark Lapierre
Tier-2 SME on-call (Level 1) Details
All times are in UTC
Name: EMEA
- Times: M T W T F : 07:00 - 15:00
- Handover Time: 07:00
- Change Shifts: Weekly, Monday 07:00
- Members:
- Vitali Tatarintev
- Omar Qunsul
- Igor Drozdov
- Bruno Cardoso
- Manoj Memana Jayakumar
- Mohamed Hamda
- Patrick Cyiza
- Eva Kadlecová
- Dillon Wheeler
- Tetiana Chupryna
- Mikolaj Wawrzyniak
- Halil Coban
- Joey Khabie
- Andras Herczeg
- Alper Akgun
- Fred de Gier
- Erran Carey
- Thomas Vik
- Alexander Chueshev
- Dmitry Gruzd
- Denys Mishunov
- Shekhar Patnaik
- Surabhi Suman
Name: AMER
- Times: M T W T F : 15:00 - 23:00
- Handover Time: 15:00
- Change Shifts: Weekly, Monday 07:00
- Members:
- Allen Cook
- Cindy Halim
- Jeff Park
- John Slaughter
- Alejandro Rodríguez Pineda
- Nathan Weinshenker
- Michael Thomas
- Jessie Young
Name: APAC
- Times: M T W T F : 23:00 - 07:00
- Handover Time: 23:00
- Change Shifts: Weekly, Monday 23:00
- Members:
- James Fargher
- Pam Claudine Artiaga
- Shinya Maeda
- Mark Chao
- Tan Le
- Mark Lapierre
Tier-2 SME Escalation on-call ( Level 2 )
Folks in this level will be paged when the initial page to Level 1 is not acknowledged within 15 minutes
- Choose one:
-
Round-robin all team members after 15 minutes (No need to fill the Level 2 template below) -
New group of a small amount of folks -
Shared escalation schedule consisting of rotation leaders -
Fill the current onboarding issue with the Escalation oncall schedule if it is not defined yet and remove the below schedule template -
Link to the schedule in incident.io where this Escalation Schedule is defined : schedule here
-
-
Tier-2 SME Escalation oncall (Level 2) Members
- Member 1
- Member 2
Tier-2 SME Escalation oncall (Level 2) Schedule Details
Name: EMEA
- Times: M T W T F S S: 07:00 - 15:00
- Handover Time: 07:00
- Change Shifts: Weekly, Monday 07:00
- Concurrent shifts: 1
- Members:
-
- Member 1
-
- Member 2
Name: AMER
- Times: M T W T F S S: 15:00 - 23:00
- Handover Time: 15:00
- Change Shifts: Weekly, Monday 07:00
- Concurrent shifts: 1
- Members:
-
- Member 3
Name: APAC
- Times: M T W T F S S: 23:00 - 07:00
- Handover Time: 23:00
- Change Shifts: Weekly, Monday 23:00
- Concurrent shifts: 1
- Members:
-
- Member 4
Escalation Path
flowchart TD
%% Nodes
priority("Priority")
channel-low("Notify Slack Channel")
channel-high("Notify Slack Channel")
notify-high("Notify User<br>All at once")
notify-low("Notify User<br>Round Robin (2m)")
retry-high("Retry if not ack'd")
retry-low("Retry if not ack'd")
%% Edge connections between nodes
priority -->|high| channel-high
priority -->|low| channel-low
channel-high -->|5m| notify-high
channel-low -->|15m| notify-low
notify-high -->|5m| retry-high
notify-low -->|10m| retry-low
The default escalation path can be changed. Time intervals can be adjusted, and notification options are not fixed. If unsure, the defaults should be a reasonable starting point.
DRI Checklist
-
Finalize On-call team members for each level -
Fill the schedule section above in this issue. -
If any of the members are part of the Incident Manager on-call rotation, please create an issue like this example here to have them removed (where possible) from the IM rotation. -
Decide on an escalation option to move forward with the Level 2 in the escalation chain
-
-
Oncall license Setup and access -
Use Slack command /requestto raise a request in Lumos for yourself to get Admin access to be able to set the rotation on incident.io -
Ensure each team member has Full accessin the "on-call seat" column on the incident.io users page, verify here. If not request the Ops team to provide it for any team members who need it by pinging a member of the Ops team on the issue. DO NOT USE THE ACSESS REQUEST TEMPLATE process for this. This is not granting permission, this is granting a full access license (for billing purposes) for that user to use the on-call features.
-
-
Setup Schedules and Escalation path -
Once the schedule section above in this issue is filled create a Schedulefor your team using incident.io. To do so you can duplicateSAMPLE tier2 - TEAMNAMEschedule and edit it as per your requirements, add the members accordingly. For the schedule name, use the formattier2 - <team name>. This is your SME on-call schedule (Level 1) -
Setup SME escalation oncall (Level 2) and escalation path based on option chosen
-
Level 2 option chosen: Round-robin
-
Navigate to Escalation pathsin incident.io UI, duplicatetier2 - team_name - Round_robinescalation path for reference and edit it as per your requirements, ensure to add your Schedules to the Escalation path. For the Escalation Path name use the formattier2 - <team name>
Level 2 option chosen: New small group of folks / Shared escalation schedule consisting of rotation leaders
-
Create a new Schedule for the SME escalation on-call (Level 2) based on the schedule information -
Navigate to Escalation pathsin incident.io UI, duplicatetier2 team_name - Small_group / Shared rotation leaderescalation path for reference and edit it as per your requirements, ensure to add your Schedules to the Escalation path. Add the SME on-call schedule in Level 1, add the Escalation SME on-call schedule in Level 2. For the Escalation Path name use the formattier2 - <team name>
Note: In the Escalation Path flow-chart, Notify represents Paging the folks on the schedule, Notify on Slack Channel will simply notify them on Slack
-
Prepare team for On-call -
Instruct your team members to set their notification preferences in the incident.io ui, this represents how they wish to be informed when they are paged -
While it's not mandatory it is recommended to have the incident.io app installed on the member's mobile device -
Review the On-Call Readiness dashboard -
<PLACEHOLDER FOR LEVELUP COURSE>
-
-
Go live! -
On the due date update the On-call teams catalog with the name ( tier2 - <team name>) and escalation path of your team. Each row in the catalog helps populate the drop-down menu that EOCs will use to select to page the required team. -
Announce in the #eoc-generalthat your team is ready to be paged, give a high-level description for this SME group's covered areas and this handbook link
-
Congratulations you are now ready to be on-call !