Response Crew Discussion and Proposals for APAC
Problem Statement
APAC is following a different way to manage SLAs when compared to EMEA and AMER.
- While implementing ticket management workflows and efficiency improvements, this inconsistency can lead to confusion and challenges.
- We always talk about how/if a crew model will work in APAC with our current numbers and groups, time to find out
😄
Current SLA management methods
APAC uses the Customer Emergency and SLAH rotations for FRT and NRT SLA management, while EMEA and AMER use the Support Response Crew. Tl:dr of response crew - instead of one Support Engineer SLAHawking continuously for 5 days, the crew makes it a team effort, with every engineer SLAHawking with their peers for 1 day a week (handbook has details on the workflow).
Proposals
1. Implement Response Crew in APAC
Idea
- We utilize the concept of the response crew as is, and work out the logistics (SaaS, SM, Group 1, Group 2) to implement it.
- Visualizing how this will look over a 8 week period with the current number of folks we have in each of the primary focus areas: Sample APAC Crew - note: this includes newer team members who are happy to be in the rotations from Q2.
- Instead of 1 week every 6 weeks for 4 hours a day as SLAHawk, this will be 1 day every week in the crew, during normal working hours.
- This visual doesn’t take into account PHs, time offs and other OOOs.
- Instead of 1 week every 6 weeks for 4 hours a day as SLAHawk, this will be 1 day every week in the crew, during normal working hours.
Pros
- Collaborative effort instead of one person SLAHawking
- Shorter “on-call” durations as it would be one day per week
- Focus on SaaS queues as well for NRT as SLAHs do not look at SaaS queues currently
- On-call rotations would be for just CMOC and Customer Emergency.
- Customer Emergency will not be burdened with FRTH responsibilities as that becomes part of crew responsibilities
Cons
- Schedule management will not be in PD anymore
- We might end up with driveby comments to simply avoid a breach - not a good experience for customers
- We’ll need to work out having folks with primary focus in SM and SaaS, and available in East and West APAC to be able to cover all focuses and all timezones within APAC (L&R folks currently do not work crew) - with the number of folks we have, this might become a big scheduling challenge to ensure we have enough people available to have a “crew” with public holidays in different countries, planned and unplanned time off etc
- Reinforces the impression that one group of people are responsible for meeting SLAs - this is not how things should be
2. Change the name of the “SLAH” rotation to “Response Crew” rotation
Idea
- Rename SLAHawk in PD to Response Crew.
- Things remain status quo.
Pros
- Status quo
- Aligned with other regions
- Most boring solution
Cons
- Status quo
- May not be scalable long term
- Reinforces the impression that one person is responsible for meeting SLAs - this is not how things should be
- Not inline with our iteration value
3. Everyone is in the crew, except when you’re not
Idea
- We remove the “SLAHawk” rotation, and we remove the “FRTH” responsibilities from the “Customer Emergency” rotation.
- On any particular weekday, everyone is essentially part of the crew, except when:
- They are on a customer emergency call.
- They are on planned or unplanned leave.
- They are on a Non Crew day - see below.
- We introduce a new schedule of “Non Crew” days every week.
- Every team member, during their non crew day, doesn't have to focus on the queue or help with SLAs.
- This time can instead be used for self development, CDP, shadowing, mentoring, coaching, training and cross training, building tools, bug fixing, documentation etc.
- Visualizing how this will look over a 8 week period: Sample APAC Non Crew - note: this includes newer team members who are happy to be in the rotations from Q2.
- This visual doesn’t take into account PHs, time offs and other OOOs.
Pros
- Dedicated time for our support engineers for self learning and other activities
- Focus on customer experience and ticket management, by avoiding and discouraging driveby comments to manage SLAs
- Reduction in the number of on-call rotations
- Reinforces the impression that SLA management is a team activity that is part of day to day for every Support Engineer
- SaaS engineers will be encouraged to pair and crush with SM focused engineers and vice versa, leading to cross training
Cons
- Possibility of bystander effect, with no SLAHawk and no official crew
- Non Crew schedule needs to be maintained in a sheet
Note: It’s expected that some responses require more time & investigation than others and the crew needs to be kept informed when someone has to step away to spend more indepth time on a particular ticket.
4. Introduce a triggerable Response Crew, leading to a Crush
Idea
- We set up "Support Response Crew APAC" as a formalised on-going crush session that will be called in response to the ticket queue size reaching a certain size: .com with SLA + SM with SLA > 120
- The crew can be disbanded once the ticket queue is reduced to a certain size or when the day ends.
- Considering current APAC team size (group 1/group 2), everyone available should attend.
- An automation can be used to monitor the queues and ping the team to assemble the crew in a crush session when needed.
Pros
- Response Crew is only activated when needed - prevents fatigue from working in a "heightened vigilance" mode for long periods
- No need for another rotation -- the idea is to get currently available folks' attention on the queues when things are "hot".
Cons
- Meaningless if the “Crew” is activated more often than it is not
- No guaranteed availability of SEs per day at any given time.
- No specific DRIs for FRT/NRT/Triage
- The automation will likely live in Slack, so Slack might end up becoming a dependency of working on tickets.
DRI
The DRI(s) for this issue would be Support Engineers in APAC interested in driving this - see Ask below.
Ask
- Any decision taken here affects the day to day work of our Support Engineers, so, we are asking for SE volunteers to be the drivers on this issue.
- Review this issue by 2021-03-31 (Wednesday) and if you are interested, please assign this issue to yourself.
- We are looking for a maximum of 4 Assignees to drive this.
- You will then be responsible for collaborating with the other assignees to evaluate the proposals here (rejecting them and/or coming up with new proposals) and settling on one to proceed with.
- Once a proposal is chosen, one of the APAC managers will partner with the assignees to work on implementing this.
- We need to keep in mind that maintaining SLA KPIs is the primary goal, apart from ensuring efficient ticket management.
Potential Roadblocks/Things to consider
- If proposals (1), (3) or (4) get chosen, we need to trial for x number of weeks to make sure there is no SLA dip below our threshold.
Desired Outcome
- SLA KPIs are met
- Quality of life improvement for SEs
- By monitoring SLA KPIs
- By surveying SEs after a certain period of time via 1:1s, team sync and maybe Polly.