Backfill schedules with an escalation policy
Why are we doing this work
For an MVC on on-call management in GitLab, we want to route alerts to a specific Schedule
. In a follow-up iteration, we want to route alerts to an Escalation Policy
(which can be comprised of many Schedules
).
We'll have a feature flag for iteration adding Escalation Policies
, and at the point of enablement, we want to backfill Escalation Policies
for all Schedules
created while the first version of on-call management is live.
Relevant links
- On-call iteration breakdown: #259828 (closed)
- Feasibility assessment: #263713 (closed)
Implementation plan
High-level: Merge a backfill migration & code to keep the backfill up-to-date on schedule creation. 2nd MR after escalation_policies_mvc
feature flag is removed to remove auto-backfilling code & possibly re-backfill if needed.
- Backfill a policy for all projects with an oncall schedule.
- Auto-create a policy if a new schedule is created (if escalation policies are not enabled in the UI)
- Enable escalation policies feature flag
- If flag has been enabled, then disabled, then re-enabled on dot-com, backfill a policy for projects with on-call schedules.
- Remove auto-policy-creation logic at our leisure.
Feature flag plan:
- Add new flag for
:escalation_policies_backfill
which is enabled by default - Enabled & remove
:escalation_policies_mvc
flag as part of feature rollout (which effectively disables:escalation_policies_backfill
) - Remove
:escalation_policies_backfill
flag
If something truly does go wrong, we can instantly disable the :escalation_policies_backfill
for production, and quickly follow it up with an MR to disable the flag by default in case the root issue isn't resolved until the following release. But if nothing does go wrong (as expected), it's nearly zero additional overhead. Just one more file to delete in the MR which removes this code.