Problem to solve
If a team has a culture of post incident reviews and documentation, they will perform and document a root cause analysis for all incidents which include symptoms, diagnostics, metrics, related logs, runbooks, and notes on what happened and how it was solved. This information can be really helpful during future incidents where the team experience similar symptoms or problems. They can reference related or similar incidents to quickly understand what is going on and how to fix it, overall, reducing MTTR.
Incidents may be related by a number of attributes:
... the list goes on. This is entirely dependent on the services they maintain, architecture of those services, tools they are using, etc.
User experience goal
Users can reference historical incidents for instructions or hints on how to solve a current incident directly from the current incident.
Identify "related" or "similar" incidents potentially based on alert attributes sent in with the alert by the monitoring tool
This work supports the Alert Management direction.