Integrate Customer Support as Tier 1 customers for Tier 2 escalations

Overview

We need to establish Customer Support as Tier 1 customers for Tier 2 escalations. This requires defining clear escalation criteria that balance customer needs with engineering work-life balance.

Business Context

  • Business decision has been made to make Support eligible as a Tier 1 customer
  • Similar decision assumed for Dedicated team who have expressed interest
  • Current escalation through #dev-escalation process has unpredictable results
  • Weekend emergencies are often upgrade/migration related (Gitaly, Database hotspots)

Key Considerations

Weekend Coverage Impact

  • Tier 2 will eventually cover weekends, requiring engineers to pause personal plans
  • Need to minimize weekend work while maintaining customer service quality
  • Risk of creating always-on weekend culture as more Tier 1 teams are added
  • Most support calls requiring dev help occur on weekends

Current Team Commitments

Engineering teams are managing:

  • Engineering allocations
  • Customer interlock commitments that cannot be missed
  • Long-term feature work customers are counting on
  • All would be impacted by Tier 2 on-call pings

Scaling Challenges

  • Significant jump from 1 to 3 Tier 1 customers
  • Need to prevent escalation fatigue while ensuring critical issues get addressed

Proposed Escalation Criteria

Immediate Tier 2 Escalation Required:

  • Emergency situation where Support has exhausted their knowledge
  • Customer cannot achieve stable production environment in reasonable timescales
  • Customer has deadline to meet for production system recovery
  • Rollback is not an option due to rigid change windows
  • S1 incidents that cannot be resolved within X minutes (TBD)

Can Go Through RFH Process:

  • Non-emergency situations where customer agrees to delay
  • Issues that can wait for normal business hours
  • Situations where temporary fix can be applied with RFH for long-term solution

Implementation Requirements

Process Documentation

  • Update handbook with clear escalation criteria
  • Document current Support emergency process for Tier 2 visibility
  • Include customer events calendar information sharing
  • Add training materials for customer call engagement

Tooling & Access

  • Ensure Support team has proper incident.io access and roles
  • Set up proper PagerDuty integration for Tier 2 notifications
  • Include Tier 2 in weekend customer event notifications

Monitoring & Improvement

  • Implement retro issue for every emergency requiring Dev assistance
  • Track escalation patterns to identify improvement opportunities
  • Monitor weekend vs weekday escalation ratios
  • Create feedback loop for better documentation/error messages

Guardrails

  • Maintain Support Manager on Call approval for escalations where possible
  • Define clear handoff procedures between Support and Engineering
  • Establish customer communication protocols during escalations

Success Metrics

  • Reduced unpredictable escalation experiences
  • Maintained customer satisfaction during emergencies
  • Sustainable weekend coverage without burnout
  • Clear documentation preventing repeat escalations for same issues

Next Steps

  1. Finalize specific escalation criteria with Support team
  2. Update handbook documentation
  3. Set up technical integrations (incident.io, PagerDuty)
  4. Establish retro process for continuous improvement
  5. Monitor initial rollout and adjust criteria as needed

Related Links

Edited by Darva Satcher