Skip to content

Backfill escalation policies for on-call schedules [RUN ALL RSPEC] [RUN AS-IF-FOSS]

Sarah Yasonik requested to merge sy-backfill-escalation-policies into master

What does this MR do?

Related issue: #268066 (closed)

This MR backfills a single EscalationPolicy for each project which has OncallSchedules.

Context

This is a part of the Escalation Policies MVC [technical plan], which extends on-call schedules to allow alerts to escalate between schedules (ex primary on-call ignores the alert, notify the secondary on-call). Escalation policies will have many escalation rules, and an escalation rule describes the conditions in which we should notify a user of an alert. So a rule would dictate something like "if the alert has not been acknowledged after 5 minutes, notify the Primary On-call Schedule."

The purpose of this MR is to ensure that users with existing on-call schedules will be notified of alerts in the same away after escalation policies are rolled out, without users needing to manually configure anything.

Existing on-call schedules feature demo: https://gitlab.com/gitlab-examples/ops/incident-setup/everyone/tanuki-inc/-/oncall_schedules

Existing alert notification logic:

  • Users can create one on-call schedule per project through the UI, but could create multiple schedules via API
  • When an alert is received for a project, we notify the on-call user in every schedule
  • If the alert is already acknowledged, we do not send an additional notification

Two main changes:

  • Adds a post-deploy migration to create escalation policies
  • Adds an after_create callback to backfill policies for new on-call schedules as they're created

Feature flags

  • escalation_policies_mvc -> existing flag which controls the escalation policies feature as a whole. Once it is enabled, users will be required to manually configure their escalation policies in order to utilize gitlab's on-call schedule management. Users who already have on-call schedules configured should experience no disruption to their alert notifications.
  • escalation_policies_backfill -> added in this MR & enabled by default; controls just the backfill logic. This is a fail-safe so we have the ability to turn off everything escalation-policies related. This flag will be removed after the escalation_policies_mvc flag is enabled and removed.

Migration tidbits:

  • gitlab.com has ~100 on-call schedules, so we expect the data migration to be pretty small & quick, as usage on self-managed instances is pretty negligible thus far.

Query Plan: https://explain.depesz.com/s/41uS

Up output:

$ bin/rails db:migrate
== 20210519220019 BackfillEscalationPoliciesForOncallSchedules: migrating =====
== 20210519220019 BackfillEscalationPoliciesForOncallSchedules: migrated (0.0101s) 

Down output:

$ bin/rails db:migrate:down VERSION=20210519220019
== 20210519220019 BackfillEscalationPoliciesForOncallSchedules: reverting =====
== 20210519220019 BackfillEscalationPoliciesForOncallSchedules: reverted (0.0000s) 

Does this MR meet the acceptance criteria?

Conformity

Availability and Testing

Security

Does this MR contain changes to processing or storing of credentials or tokens, authorization and authentication methods or other items described in the security review guidelines? If not, then delete this Security section.

  • Label as security and @ mention @gitlab-com/gl-security/appsec
  • The MR includes necessary changes to maintain consistency between UI, API, email, or other methods
  • Security reports checked/validated by a reviewer from the AppSec team
Edited by Sarah Yasonik

Merge request reports