Alert values in GitLab database can go out of sync with Prometheus

Summary

When alerts are modified on our UI, the modifications need to be synced to Prometheus, which does the alert detection, firing, and notifications. If the alert sync fails, we should reverse the change in our database as well.

Steps to reproduce

Follow the steps in #220304 (closed) to cause alert syncing to throw an error.

Observe that the alert values in our database are not reverted when the sync fails.

Example Project

What is the current bug behavior?

Alert configuration in our database goes out of sync with Prometheus when the Prometheus config update fails.

What is the expected correct behavior?

The alert values in our database should be in sync with the values in Prometheus.

Relevant logs and/or screenshots

Output of checks

This bug happens on GitLab.com

Results of GitLab environment info

Expand for output related to GitLab environment info

(For installations with omnibus-gitlab package run and paste the output of:
`sudo gitlab-rake gitlab:env:info`)

(For installations from source run and paste the output of:
`sudo -u git -H bundle exec rake gitlab:env:info RAILS_ENV=production`)

Results of GitLab application Check

Expand for output related to the GitLab application check

(For installations with omnibus-gitlab package run and paste the output of: sudo gitlab-rake gitlab:check SANITIZE=true)

(For installations from source run and paste the output of: sudo -u git -H bundle exec rake gitlab:check RAILS_ENV=production SANITIZE=true)

(we will only investigate if the tests are passing)

Possible fixes

We can create a worker that first sends the new alerts to Prometheus and calls the reload API. If the reload succeeds, then update our database.

Assignee Loading
Time tracking Loading