Skip to content

Tier 2 SME Fulfillment onboarding

Tier 2 SME Group Onboarding for Incident Response

Summary

  • Group Name: Fulfillment
  • Group Manager / DRI: @jameslopez
  • Slack Channel: #s_fulfillment_engineering

Tier-2 SME on-call ( Level 1 )

Tier 2 rotations are for subject matter experts. On average, they should know more about their subject matter than engineers outside of the group.

Tier-2 SME on-call (Level 1) Members

https://docs.google.com/spreadsheets/d/1h9LK8Q_TDJ0pTkg7ZbTEKY_Vy_WPHhcq0dHlBkXEqZY/edit?gid=136556602#gid=136556602

Tier-2 SME on-call (Level 1) Schedule

All times are in UTC

Name: EMEA

  • Times: M T W T F : 07:00 - 15:00
  • Handover Time: 07:00
  • Change Shifts: Weekly, Monday 07:00
  • Members:
    • Ammar Alakkad
    • Angelo Gulina
    • Bishwa Hang Rai
    • Corinna Gogolok
    • Divya Mahadevan
    • Kos Palchyk
    • Lukas Wanko
    • Michael Lunoe
    • Paulo Barros
    • Roy Zwambag
    • Sharmad Nachnolkar
    • Sheldon Led
    • Shreyas Agarwal
    • Vijay Hawoldar
    • Vitaly Slobodin

Name: AMER

  • Times: M T W T F : 15:00 - 23:00
  • Handover Time: 15:00
  • Change Shifts: Weekly, Monday 07:00
  • Members:
    • Aishwarya Subramanian
    • Etienne Baque
    • Jason Goodman
    • Jorge Cook
    • Katherine Richards
    • Minahil Nichols
    • Ryan Cobb
    • Tyler Amos
    • Valerie Burton
    • Vladlena Shumilo

Name: APAC

  • Times: M T W T F : 23:00 - 07:00
  • Handover Time: 23:00
  • Change Shifts: Weekly, Monday 23:00
  • Members:
    • Abhay V Ashokan
    • Aman Luthra
    • Josianne Hyson
    • Matt Sroufe
    • Qingyu Zhao
    • Suraj Tripathi
    • Tarun Vellishetty
    • Vamsi Vempati

Tier-2 SME Escalation on-call ( Level 2 )

Folks in this level will be paged when the initial page to Level 1 is not acknowledged within 15 minutes Rotation owners must be in the escalation path for their rotations.

  • Choose one:
    • Round-robin all team members after 15 minutes (No need to fill the Level 2 template below)
    • New group of a small amount of folks
    • Shared escalation schedule consisting of rotation leaders
      • Fill the current onboarding issue with the Escalation oncall schedule if it is not defined yet and remove the below schedule template
      • Link to the schedule in incident.io where this Escalation Schedule is defined : schedule here

Tier-2 SME Escalation oncall (Level 2) Members

  • Member 1
  • Member 2

Tier-2 SME Escalation oncall (Level 2) Schedule

Escalation Path

flowchart TD
    %% Starting point
    incident[Incident Escalated to Team]

    %% Level 1 - Tier 2 SME On-call
    level1[Level 1: Page Current SME On-call<br/>Based on schedule rotation]

    %% Acknowledgment check
    ack_level1{Acknowledged<br/>within 15min?}

    %% Level 2 escalation
    level2[Level 2: SME Escalation On-call<br/>Paged after no Level 1 response]

    %% Schedule note
    schedule_note[Note: Each level requires<br/>its own incident.io schedule]

    %% Final escalation
    level2_ack{Level 2<br/>Acknowledged?}
    final_escalation[Further Escalation<br/>DRI/Manager involvement]

    %% Success path
    resolved[Incident Handled]

    %% Flow connections
    incident --> level1
    level1 --> ack_level1
    ack_level1 -->|Yes| resolved
    ack_level1 -->|No| level2

    %% Level 2 handling
    level2 --> level2_ack
    level2_ack -->|Yes| resolved
    level2_ack -->|No| final_escalation

    %% Schedule note positioning
    level2 -.-> schedule_note

    %% Styling
    classDef level1 fill:#fff3e0
    classDef level2 fill:#fce4ec
    classDef decision fill:#f3e5f5
    classDef success fill:#e8f5e8
    classDef escalation fill:#ffebee
    classDef note fill:#f5f5f5,stroke:#999,stroke-dasharray: 5 5

    class level1 level1
    class level2 level2
    class ack_level1,level2_ack decision
    class resolved success
    class final_escalation escalation
    class schedule_note note

The default escalation path can be changed. Time intervals can be adjusted, and notification options are not fixed. If unsure, the defaults should be a reasonable starting point.

DRI Checklist

  • Go through the Rotation Leader LevelUp channel for detailed instructions on how to onboard your team (optional)
  • Finalize On-call team members for each level
    • Fill the schedule section for Level 1 and Level 2 above in this issue.
    • If any of the members are part of the Incident Manager on-call rotation, please create an issue like this example here to have them removed (where possible) from the IM rotation.
    • If any of the team members are part of the Dev on-call rotation, please add their emails to the Excluded Team Member Emails tab with the name of the rotation under reason in the eligibility spreadsheet to exclude them from the rotation.
    • Decide on an escalation option to move forward with the Level 2 in the escalation chain

Note: As the rotation leader/owner you must exist in the escalation chain , you can either include yourself in Level 2 or add an extra step to page yourself in case a page to Level 2 goes unacknowledged as well

  • Oncall license Setup and access
    • Use Slack command /request to raise a request in Lumos for yourself to get On Call Scheduler access to be able to set the rotation on incident.io
    • Ensure each team member has Full access in the "on-call seat" column on the incident.io users page, verify here. If not request the Networking & Incident Management team to provide it for any team members who need it by pinging a member of the Networking & Incident Management team on the issue. DO NOT USE THE ACSESS REQUEST TEMPLATE process for this. This is not granting permission, this is granting a full access license (for billing purposes) for that user to use the on-call features.
  • Setup Schedules and Escalation path
  • Once the schedule section above in this issue is filled create a Schedule for your team using incident.io. To do so you can duplicate SAMPLE tier2 - TEAMNAME schedule and edit it as per your requirements, add the members accordingly. For the schedule name, use the format tier2 - <team name>. This is your SME on-call schedule (Level 1)
  • Setup SME escalation oncall (Level 2) and escalation path based on option chosen, Rotation owners must be in the escalation path for their rotations some options include creating a third level in the escalation path to page you after no response from Level 2 , or you can be a member of the Level 2 rotation
Level 2 option chosen: Round-robin
  • Navigate to Escalation paths in incident.io UI, duplicate tier2 - team_name - Round_robin escalation path for reference and edit it as per your requirements, ensure to add your Schedules to the Escalation path. For the Escalation Path name use the format tier2 - <team name>
  • Refer the Round-robin incident.io doc to figure out the best way to implement this for your team. A good starting point would be to cycle through the responders every 10 minutes and time out after 60 minutes of it to go to the next step in the escalation path
Level 2 option chosen: New small group of folks / Shared escalation schedule consisting of rotation leaders
  • Create a new Schedule for the SME escalation on-call (Level 2) based on the schedule information
  • Navigate to Escalation paths in incident.io UI, duplicate tier2 team_name - Small_group / Shared rotation leader escalation path for reference and edit it as per your requirements, ensure to add your Schedules to the Escalation path. Add the SME on-call schedule in Level 1, add the Escalation SME on-call schedule in Level 2. For the Escalation Path name use the format tier2 - <team name>

Note: In the Escalation Path on incident.io, Notify represents Paging the folks on the schedule, Notify on Slack Channel will simply notify them on Slack

  • Prepare team for On-call

    • Inform rotation members to ignore notifications about upcoming on-call shifts , with a message like below
    Hi, you'll be getting a notification about upcoming on-call shifts. Do not worry, you will not be paged yet. We will only activate the rotation on date X. Any shifts scheduled before that are just for us to test the setup and prepare for the go-live
  • Go live!

    • On the due date update the On-call teams catalog with the name (tier2 - <team name>) and escalation path of your team. Each row in the catalog helps populate the drop-down menu that EOCs will use to select to page the required team.
    • Announce in the #eoc-general that your team is ready to be paged, give a high-level description for this SME group's covered areas and this handbook link

Congratulations you are now ready to be on-call !

Edited by James Lopez