Skip to content

Truncate alerts for large Prometheus payloads when FF is enabled

What does this MR do and why?

This MR adds a new ops feature flag :prometheus_notify_max_alerts which limits to amount of processable alerts in a single Prometheus payload.

When enabled, it truncates incoming alerts if the amount of alerts in the provided Prometheus payload exceeds 100. It does return 200 OK so Prometheus does not retry sending these alerts over and over again - see https://gitlab.com/gitlab-com/gl-infra/production/-/issues/6086.

This is to mitigate https://gitlab.com/gitlab-com/gl-infra/production/-/issues/6086 until we've found the root cause and a proper fix.

This MR also fixes a bug discovered during testing. Refs !73697 (comment 791377268).

To future reverter 🔙

It's very likely you want to revert !77168 (9e436e12). The other bugfix should be sane 😅

How to set up and validate locally

  1. Enable HTTP Integration Prometheus (GitLab Premium)
  2. Configure Prometheus to trigger more than 100 alerts
  3. In Rails console
project = Project.last
Feature.enable(:prometheus_notify_max_alerts, project)
  1. Let Prometheus emit more than 100 alerts
  2. Prometheus sent all alerts but GitLab only processes 100 and emits a log warning

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by Peter Leitzen

Merge request reports