Health 13.2 Planning Issue
Health Planning Board
Goals for the milestone:
- Improve Triage workflow by...
- MVC OpsGenie Integration
- Linking to other resources (runbooks, metrics, logs)
- Improvements to set-up and configuration of Alerts
- Ability to create GitLab issues from PagerDuty Incidents
SRE Shadow Program
1 week dedicated time for the SRE Shadow Program
Scope of Work for Engineering
OpsGenie Integrations
| Priority | Issue | Size | Notes | Frontend | Backend |
|---|---|---|---|---|---|
| MVC GitLab and Ops Genie Integration |
Metrics
| Priority | Issue | Size | Notes | Frontend | Backend |
|---|---|---|---|---|---|
| Add time counter to instrumented metric: incident_labeled_issues | S | ||||
| Single multi-metric embeds should expand to full-width |
Triaging Alerts - How users view and interact with Alerts
Linked Resources - Linking runbooks, metrics, and logs to alerts
| Priority | Issue | Size | Notes | Frontend | Backend |
|---|---|---|---|---|---|
| Surface metrics chart on the alert detail page for alerts from GitLab-managed Prometheus instances | M | ||||
| Surface metrics chart on the alert detail page for alerts from externally-managed Prometheus instances | S | this size assumes we can reuse most of the above |
|||
| Surface link to logs in alerts from GitLab-managed Prometheus instances | S | The initial iteration depends on metric embeds within alerts |
|||
| Surface link to alert runbook on metric chart | S |
Alert Configuration - Set-up, endpoints, and data management
| Priority | Issue | Size | Notes | Frontend | Backend |
|---|---|---|---|---|---|
| Trigger test alert in GitLab | S | ||||
| Moving alerts endpoint configuration to Settings > Operations | M | ||||
| Add monitoring tool dropdown to alerts endpoint page | S | ||||
| Spike: Create a simple repeatable process by which customers can integrate any tool with GitLab |
PagerDuty Integration
| Priority | Issue | Size | Notes | Frontend | Backend |
|---|---|---|---|---|---|
| Create GitLab Issues from PagerDuty Incidents | S |
Technical Debt
| Priority | Issue | Size | Notes | Frontend | Backend |
|---|---|---|---|---|---|
| Merge or add relation between Alert models | M | ||||
| REFACTORING: Follow-up from "Update status on alert management detail view" |
Scope of Work for UX
- May be a candidate for engineering refinement, if needed. Likely won't need a substantial amount of design work.
Scope of Work for Testing
| Issue | Investigates/Tests | Due on |
|---|---|---|
Edited by Clement Ho