Skip to content

Disable auto assignment of incident lead in incident.io

Problem Statement

We received feedback that the current implementation of incident lead auto assignment doesn't work well. Currently we have two workflows, one which sets the incident lead to the IMOC if the severity is severity 1 or severity 2, and another which sets the incident lead to the EOC for severity 3 or severity 4. While each workflow is set to only run once, because they are two separate workflows, this means the incident lead might be unexpectedly reassigned if the severity goes between the two workflows. For example if I have a severity 3 incident that gets set to severity 2, the incident lead will be reassigned to IMOC. This caused confusion when an incident was unexpectedly and unknowingly reassigned and someone else started working the incident unnecessarily.

This issue was compounded when we discovered that automatic assignment may also issue "Responder" licenses to individuals who may not actually need them, potentially resulting in unnecessary license usage.

Finally, auto assignment fails when there are more than one person in the pagerduty schedule for EOC or IM. This occurs when we have folks in a shadow rotation. This means that even with auto assignment turned on, it frequently does not work as intended.

Proposed Solution

We propose to disable all automatic assignment and reassignment of incident leads, and instead make lead assignment an intentional, explicit process:

Core Changes

  1. Disable Auto-Assignment: Turn off all automatic assignment for incident lead
  2. Initial State: Incidents start with no assigned lead, requiring intentional assignment
  3. Clear Guidance: Provide clear in-tool messaging about who should typically be assigned as lead based on incident type, severity, and complexity

Implementation Details

  1. Disable Auto-Assignment:

    • in incident.io workflows under dotcom folder, Sev1/Sev2, and Sev3/Sev4 workflows should be disabled.
  2. Automatic Notifications (no change):

    • For Sev1/Sev2: Continue to auto-page IMOC, EOC, and IM
    • For Sev3/Sev4: Continue to NOT auto-page EOC. Add a message to the channel to make sure they know that no one is going to be added to the channel or paged unless they request help.
  3. Lead Selection Guidance (to be included in tool and handbook):

    • Low Complexity Incidents: The team member most familiar with the affected system should lead (often the reporter)
    • High Complexity Incidents: IMOC (for Sev1/2) or EOC (for Sev3/4) should typically lead due to coordination requirements, but product engineers and engineering managers are also capable of fulfilling this role.
    • Delivery-Related Incidents: Release managers are often well-positioned to lead
    • Security Incidents: Security team members should typically lead
  4. Handoff Process (ensure this is documented):

    • /inc handover @username for explicit handoffs
    • Require acknowledgment from the receiving party

Handbook Updates

The handbook should be updated to include:

  1. Definition of Incident Lead: "The Incident Lead is responsible for overseeing an incident from declaration to resolution. They coordinate the response effort and ensure all necessary actions are being taken."

  2. Selection Criteria: Clear guidance on who should serve as lead based on:

    • Incident complexity (single team vs. multi-team involvement)
    • Technical domain (who has the most relevant expertise)
    • Availability (who can commit to seeing the incident through)
  3. Lead Responsibilities:

    • Coordinate the overall incident response
    • Ensure appropriate resources are engaged
    • Facilitate communication between teams
    • Drive the incident to resolution
    • Ensure proper documentation
  4. Handoff Best Practices:

    • When to consider transferring lead role
    • How to properly communicate the transfer
    • Ensuring continuity during handoffs
Edited by Kam Kyrala