Skip to content

Handle race condition in creating alerts

Sarah Yasonik requested to merge sy-handle-non-unique-alert-errors into master

What does this MR do and why?

When multiple requests are POSTed to an alert integration around the same time, there's a chance for a race condition to occur with writing to the database.

This MR adds handling for the two possible errors from that race condition:

  1. Writing to the database fails due to the uniqueness constraint in postgres
  2. Writing to the database fails due to the uniqueness validation on the model

How to set up and validate locally

Unfortunately, neither of these errors can be easily triggered by modifying inputs. So the simplest way to see the behavior locally is hackily via pry.

  1. Add a debugger in app/services/concerns/alert_management/alert_processing.rb

     def process_new_alert
       return if resolving_alert?
    +  binding.pry
       if alert.save
  2. Trigger alert processing in the rails console

    payload = { 'annotations' => { 'title' => 'TITLE' }, 'startsAt' => '2022-08-04T11:22:40Z' }
    project = Project.first
    AlertManagement::ProcessPrometheusAlertService.new(project, payload).execute
  3. When the debugger pops up, create an alert which will cause validations to fail.

    > alert.dup.save
    > continue
  4. Navigate to Monitor > Alerts in the UI to see the most recent alert w/ 2 events

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Related to https://gitlab.com/gitlab-org/gitlab/-/issues/348676 (lightly)

Edited by Sarah Yasonik

Merge request reports