Support alerting for custom dashboard (area-charts)

Problem to solve

Users should be able to set alerts on metrics defined in dashboard yml files in the project. The ability is available for gitlab-defined "common" metrics & custom metrics created in the UI, but not yet for yml-defined metrics. As a first step, we should look at bringing metrics defined for area-charts up to parity with existing metrics. Metrics for other panel types should come in another iteration.

  • Open questions:
    • When do we want to persist metrics? Pipeline?
      • Should this step include validation & fail the build for invalid dashboards?
    • Do we want to clean-up project metrics which are no longer present in a dashboard? Or do they live forever?
    • Do we want to perform reconciliation between custom metrics created in the UI & dashboard yml-defined metrics?

Intended users

Further details

Proposal

Technical Implementation Proposal:

  • Refactor CommonMetricsImporter to support project-defined metrics; Include a follow-up step that deletes removes project-defined metrics (probably distinguishing from custom metrics by the presence of the identifier from the yml)
  • Call updated importer from pipeline? (This is the part I'm fuzzier on and has potential for scope creep)
  • Update Metrics::Dashboard::Processor stages to account for project-defined dashboard metrics
  • Test alerting (Should work out of the box, but we'll need to test)

First iteration:

  • Should the persistence step include validation & fail the build for invalid dashboards? No.
  • Do we want to clean-up project metrics which are no longer present in a dashboard? Or do they live forever? No, they are removed.
  • Do we want to perform reconciliation between custom metrics created in the UI & dashboard yml-defined metrics? No. If a metrics has been defined in both spots, the user can do any cleanup they want to.

Permissions and Security

Documentation

Testing

  • integration tests as usual
  • end to end test
  • consider to trigger package-and-qa on MR to ensure existing end-to-end tests are not breaking (an alert test is defined but only for common metrics)

What does success look like, and how can we measure that?

Links / references

Edited by Sofia Vistas