Replace TTL-based approval sync with event-driven coordinator and Redis set tracking

Summary

Gated behind the deterministic_approval_rules_sync feature flag. When disabled, all code paths behave exactly as before.

Problem

When two rapid pushes happen on the same branch, Push B's SyncCodeOwnerApprovalRules deletes and recreates approval rules. If Push A's MergeRequestResetApprovalsWorker runs during this sync, it sees a partial rule state and misses approval resets. The existing mitigations were probabilistic:

  • temporarily_unapprove! used a 15-second TTL Redis key — unreliable under load
  • perform_in(10.seconds) delay on the reset worker — a blind guess that the sync finishes in time

Solution

Three changes that work together:

1. Deterministic sync tracking (replaces TTL-based temporarily_unapprove!)

  • New Redis set (mr_{id}_syncing_approvals) with start_approval_rules_sync! / finish_approval_rules_sync!
  • Each push uniquely tracked as code_owner:{oldrev}:{newrev} or reset_approvals:{oldrev}:{newrev}
  • Concurrent pushes tracked independently — Push A's completion doesn't clear Push B's flag
  • 5-minute safety TTL prevents orphaned keys

2. Event-driven ApprovalRulesSyncCoordinator

Centralizes all sync tracking and scheduling logic:

  • sync_code_owners! — synchronous wrapper for push flow (start → execute → finish)
  • start_code_owner_sync! / finish_code_owner_sync! — async pair for MR creation flow
  • schedule_reset_approvals! — stores worker params in Redis, enqueues immediately if no code owner sync is in progress
  • finish_code_owner_sync! automatically checks for pending resets and enqueues the worker when the sync completes
  • Uses Redis GETDEL for atomic read-and-clear of pending params (prevents double-enqueue)

3. Simplified worker and caller

  • MergeRequestResetApprovalsWorker — no longer has retry/error logic; by the time it runs, the coordinator guarantees the sync is complete
  • approval_service.rb — delegates scheduling entirely to the coordinator; no direct perform_async when FF is on
  • SyncCodeOwnerApprovalRules — pure sync service with no tracking params
  • Uses perform_async instead of perform_in(10.seconds) — no artificial delay

Split from !228050 (merged) per reviewer feedback.

Relates to #373846

Feature flag

deterministic_approval_rules_sync (gitlab_com_derisk, default off)

When disabled: all code paths are identical to the previous behavior (TTL-based temporarily_unapprove!, perform_in(10.seconds), no sync checks).

When enabled: uses Redis set tracking, event-driven coordinator, and perform_async with no artificial delay.

  • !228050 (merged) — Fixes a separate race condition where previous_diff shifts due to concurrent diff creation (passes oldrev for stable SHA comparison)

Test plan

  • Specs for ApprovalState: start_approval_rules_sync!, finish_approval_rules_sync!, approval_rules_syncing?, any_approval_rules_syncing?, expire_unapproved_key!, and temporarily_unapproved? with FF on/off
  • Specs for CreateService, ApprovalService, ResetApprovalsService, and SyncCodeOwnerApprovalRules
  • Specs for SyncCodeOwnerApprovalRulesWorker with finish_sync and expire_unapproved_key params
  • Specs for MergeRequestResetApprovalsWorker verifying it executes regardless of sync state (coordinator handles ordering)
  • Manually verified on GDK: FF off reproduces race condition (worker ignores concurrent sync), FF on blocks worker until sync completes via event-driven coordinator

EE: true

Edited by Marc Shaw

Merge request reports

Loading