Improve alert details experience
🤔
Problem As part of usability testing in ux-research#1555 (closed), we discovered that the default alert details page doesn't contain enough information by default to triage the alert. We are also not clearly communicating what events are in alerts; when there are multiple events, we're not allowing people to see what those events are. We're currently only displaying information about the first event only, which can be problematic, as the payloads may not actually be identical.
Nearly all of the things that people wanted to see are things that would likely need to be provided to us in the alert payload (for example, better titles/descriptions, information about thresholds/current values, links to relevant metrics/logs, ways to investigate health status of associated clusters, environment information). So the problem is twofold: both getting more robust information sent to us in the payloads, and then ensuring that additional information is visible and actionable on the alert detail page.
We can currently receive and display content from an annotations
field in Prometheus alerts. However, we don't clearly surface those annotations within the alert in a readable way (they are currently displayed in a continuous string within the "Details" section of the alert details table). Also, we don't accept or display annotations
from non-Prometheus alerts.
📝
Proposal - Update the alert overview to allow for a denser display of information.
- Surface payloads for all events that have been grouped together within the alert. The events will now be the table rows, rather than the current key/value pairs.
- Allow people to send annotations in payloads for all alerts, not just for Prometheus alerts. This will allow people to supply us all the content that's necessary for properly triaging the alerts, including links to runbooks, dashboards, and additional relevant details about the alert itself.
- Prettify the alert payload so that each key/value pair will be broken into separate lines.
- Ensure that any links within the payload are clickable.
Design | Walkthrough of changes |
---|---|
Note: Before implementing this issue we'll need to explore how to safely collect event data. So, we'll likely need some technical planning work done for this issue