Replace TTL-based approval sync with event-driven coordinator and Redis set tracking
Summary
Gated behind the deterministic_approval_rules_sync feature flag. When disabled, all code paths behave exactly as before.
Problem
When two rapid pushes happen on the same branch, Push B's SyncCodeOwnerApprovalRules deletes and recreates approval rules. If Push A's MergeRequestResetApprovalsWorker runs during this sync, it sees a partial rule state and misses approval resets. The existing mitigations were probabilistic:
temporarily_unapprove!used a 15-second TTL Redis key — unreliable under loadperform_in(10.seconds)delay on the reset worker — a blind guess that the sync finishes in time
Solution
Three changes that work together:
1. Deterministic sync tracking (replaces TTL-based temporarily_unapprove!)
- New Redis set (
mr_{id}_syncing_approvals) withstart_approval_rules_sync!/finish_approval_rules_sync! - Each push uniquely tracked as
code_owner:{oldrev}:{newrev}orreset_approvals:{oldrev}:{newrev} - Concurrent pushes tracked independently — Push A's completion doesn't clear Push B's flag
- 5-minute safety TTL prevents orphaned keys
2. Event-driven ApprovalRulesSyncCoordinator
Centralizes all sync tracking and scheduling logic:
sync_code_owners!— synchronous wrapper for push flow (start → execute → finish)start_code_owner_sync!/finish_code_owner_sync!— async pair for MR creation flowschedule_reset_approvals!— stores worker params in Redis, enqueues immediately if no code owner sync is in progressfinish_code_owner_sync!automatically checks for pending resets and enqueues the worker when the sync completes- Uses Redis
GETDELfor atomic read-and-clear of pending params (prevents double-enqueue)
3. Simplified worker and caller
MergeRequestResetApprovalsWorker— no longer has retry/error logic; by the time it runs, the coordinator guarantees the sync is completeapproval_service.rb— delegates scheduling entirely to the coordinator; no directperform_asyncwhen FF is onSyncCodeOwnerApprovalRules— pure sync service with no tracking params- Uses
perform_asyncinstead ofperform_in(10.seconds)— no artificial delay
Split from !228050 (merged) per reviewer feedback.
Relates to #373846
Feature flag
deterministic_approval_rules_sync (gitlab_com_derisk, default off)
When disabled: all code paths are identical to the previous behavior (TTL-based temporarily_unapprove!, perform_in(10.seconds), no sync checks).
When enabled: uses Redis set tracking, event-driven coordinator, and perform_async with no artificial delay.
Related MRs
- !228050 (merged) — Fixes a separate race condition where
previous_diffshifts due to concurrent diff creation (passesoldrevfor stable SHA comparison)
Test plan
- Specs for
ApprovalState:start_approval_rules_sync!,finish_approval_rules_sync!,approval_rules_syncing?,any_approval_rules_syncing?,expire_unapproved_key!, andtemporarily_unapproved?with FF on/off - Specs for
CreateService,ApprovalService,ResetApprovalsService, andSyncCodeOwnerApprovalRules - Specs for
SyncCodeOwnerApprovalRulesWorkerwithfinish_syncandexpire_unapproved_keyparams - Specs for
MergeRequestResetApprovalsWorkerverifying it executes regardless of sync state (coordinator handles ordering) - Manually verified on GDK: FF off reproduces race condition (worker ignores concurrent sync), FF on blocks worker until sync completes via event-driven coordinator
EE: true