Add Contributing Factors to incident.io:

Overview

This issue tracks the implementation of standardized Contributing Factors custom fields in incident.io to enable comprehensive root causes analysis and improve our incident management maturity. By establishing a structured taxonomy of contributing factors, we'll enhance our ability to identify patterns, prevent future incidents, and maintain consistency across incident reviews.

DRI

TBD

Participants

TBD

Why Contributing Factors Standardization?

Discovery & Analysis: A structured taxonomy makes it easier to identify recurring patterns and systemic issues across incidents
Consistency: Standardized options ensure all incident responders categorize issues using the same language and framework
Reporting: Enables automated reporting on common failure modes to inform infrastructure investments and process improvements
Blameless Culture: Comprehensive categories covering technical, human, and process factors reinforce our commitment to blameless incident reviews
Integration Readiness: Well-structured data enables future automation and integration with other tools in our incident management ecosystem

List of Contributing Factors

We are implementing Contributing Factors with the following categories:

Technical Issues

Bug in application code
Configuration error
Feature Flag enabled
Feature Flag missing
Infrastructure or hardware failure
Database/Data store failure
Data/Schema Changes
Network or connectivity issue
Capacity overload / Performance issue
Release or deployment problem
Architectural/design limitation

Human Factors

Information or feedback gaps
Miscommunication or coordination gap
System design knowledge gaps
Necessary workarounds

Process / Policy Shortcomings

Inadequate testing or QA
Change management gap
Lack of documentation/runbooks
Ownership or escalation gap

External Factors

Third-party service/API outage
Cloud/Infrastructure provider issue
Upstream dependency change
Security attack or breach

Monitoring / Alerting Gaps

Delayed detection (missing alert)
Incomplete observability
Automation/Tooling issue

Benefits

Root Cause Identification: Comprehensive categorization ensures no contributing factor is overlooked during incident analysis
Trend Analysis: Enables quarterly/annual reporting on most common incident causes to drive preventive measures
Resource Allocation: Data-driven insights on failure patterns help prioritize engineering efforts and infrastructure investments
Knowledge Sharing: Consistent tagging improves searchability and learning from past incidents
Compliance & Audit: Structured data supports regulatory reporting and demonstrates mature incident management practices

Implementation Steps

Custom Field Configuration
- Create custom field in incident.io
- Ensure that more than one contributing factor can be selected
- Configure field options with proper grouping/categorization
- Set field as required for incident closure
- Test field functionality in staging environment
Documentation & Training
- Update training materials for incident responders
- Update incident response procedures to include factor selection
Rollout & Adoption
- Gather feedback and refine options if needed
- Monitor adoption rates and field usage
Reporting & Analytics
- Integrate data with existing SRE metrics and reports

Success Metrics

100% of incidents have at least one contributing factor selected within 30 days of rollout

Edited Jun 24, 2025 by Alex Hanselka

Assignee Loading

Time tracking Loading