Support alerting for custom dashboard (area-charts)
Problem to solve
Users should be able to set alerts on metrics defined in dashboard yml files in the project. The ability is available for gitlab-defined "common" metrics & custom metrics created in the UI, but not yet for yml-defined metrics. As a first step, we should look at bringing metrics defined for area-charts
up to parity with existing metrics. Metrics for other panel types should come in another iteration.
- Open questions:
- When do we want to persist metrics? Pipeline?
- Should this step include validation & fail the build for invalid dashboards?
- Do we want to clean-up project metrics which are no longer present in a dashboard? Or do they live forever?
- Do we want to perform reconciliation between custom metrics created in the UI & dashboard yml-defined metrics?
- When do we want to persist metrics? Pipeline?
Intended users
Further details
Proposal
Technical Implementation Proposal:
- Refactor
CommonMetricsImporter
to support project-defined metrics; Include a follow-up step that deletes removes project-defined metrics (probably distinguishing from custom metrics by the presence of the identifier from the yml) - Call updated importer from pipeline? (This is the part I'm fuzzier on and has potential for scope creep)
- Update
Metrics::Dashboard::Processor
stages to account for project-defined dashboard metrics - Test alerting (Should work out of the box, but we'll need to test)
First iteration:
- Should the persistence step include validation & fail the build for invalid dashboards? No.
- Do we want to clean-up project metrics which are no longer present in a dashboard? Or do they live forever? No, they are removed.
- Do we want to perform reconciliation between custom metrics created in the UI & dashboard yml-defined metrics? No. If a metrics has been defined in both spots, the user can do any cleanup they want to.
Permissions and Security
Documentation
Testing
- integration tests as usual
- end to end test
- consider to trigger
package-and-qa
on MR to ensure existing end-to-end tests are not breaking (an alert test is defined but only for common metrics)
What does success look like, and how can we measure that?
Links / references
Edited by Sofia Vistas