Health 13.2 Planning Issue

Health Planning Board

Health Planning Board

Goals for the milestone:

  • Improve Triage workflow by...
    • MVC OpsGenie Integration
    • Linking to other resources (runbooks, metrics, logs)
    • Improvements to set-up and configuration of Alerts
    • Ability to create GitLab issues from PagerDuty Incidents

SRE Shadow Program

1 week dedicated time for the SRE Shadow Program

Scope of Work for Engineering

OpsGenie Integrations

Priority Issue Size Notes Frontend Backend
MVC GitLab and Ops Genie Integration

Metrics

Priority Issue Size Notes Frontend Backend
Add time counter to instrumented metric: incident_labeled_issues S
Single multi-metric embeds should expand to full-width

Triaging Alerts - How users view and interact with Alerts

Priority Issue Size Notes Frontend Backend
Surface linked incident issues in alert list S
Link to alert details from incident issue S
Allow users to see which alerts are new in the alert list view S
System notes on alert detail page S
System note on alert indicating resolution by issue closure S
Automatically mark To-do as done when an alert is resolved S
Search plain text in alert list
Filter alert status counts by search S
Feature spec for alert list page
Alert status dropdown alignment is off on some screen sizes
Add alert list empty state for those with reporter access S

Linked Resources - Linking runbooks, metrics, and logs to alerts

Priority Issue Size Notes Frontend Backend
Surface metrics chart on the alert detail page for alerts from GitLab-managed Prometheus instances M
Surface metrics chart on the alert detail page for alerts from externally-managed Prometheus instances S this size assumes we can reuse most of the above
Surface link to logs in alerts from GitLab-managed Prometheus instances S The initial iteration depends on metric embeds within alerts 👆
Surface link to alert runbook on metric chart S

Alert Configuration - Set-up, endpoints, and data management

Priority Issue Size Notes Frontend Backend
Trigger test alert in GitLab S
Moving alerts endpoint configuration to Settings > Operations M
Add monitoring tool dropdown to alerts endpoint page S
Spike: Create a simple repeatable process by which customers can integrate any tool with GitLab

PagerDuty Integration

Priority Issue Size Notes Frontend Backend
Create GitLab Issues from PagerDuty Incidents S

Technical Debt

Priority Issue Size Notes Frontend Backend
Merge or add relation between Alert models M
REFACTORING: Follow-up from "Update status on alert management detail view"

Scope of Work for UX

Issue When it should be ready
Dedicated incidents %13.3
Alert integration builder %13.3
Reworking incidents section on settings > operations %13.3
Incident triage list %13.3
Make it easy to create a runbook when creating an alert %13.4
Automatically assign alerts to user when they acknowledge it* %13.3
  • May be a candidate for engineering refinement, if needed. Likely won't need a substantial amount of design work.

Scope of Work for Testing

Issue Investigates/Tests Due on
Edited by Clement Ho