Draft: Wire ApprovalRulesSyncCoordinator into create/push/reset flows

Wires MergeRequests::ApprovalRulesSyncCoordinator (from !232648 (merged)) into the call sites that need sync tracking: create, push refresh, reset approvals. This is the MR that actually fixes the race condition; !232648 (merged) only added the infrastructure.

MR 2 of 2 split from !229090 (closed) per reviewer feedback. Depends on !232648 (merged).

Gated behind the deterministic_approval_rules_sync feature flag. When disabled, all code paths behave exactly as before (TTL-based temporarily_unapprove!, perform_in(10.seconds), no sync checks).

Problem

When two rapid pushes happen on the same branch, Push B's SyncCodeOwnerApprovalRules deletes and recreates approval rules. If Push A's MergeRequestResetApprovalsWorker runs during this sync, it sees a partial rule state and misses approval resets. The existing mitigations were probabilistic:

  • temporarily_unapprove! used a 15-second TTL Redis key — unreliable under load
  • perform_in(10.seconds) delay on the reset worker — a blind guess that the sync finishes in time

Solution

  • CreateService#after_create: calls start_code_owner_sync! before the worker is enqueued so the MR is marked as temporarily unapproved without a gap between create and the async worker run.
  • Refresh::ApprovalService#update_approvers_for_source_branch_merge_requests: calls coordinator.sync_code_owners! (synchronous start → execute → finish → enqueue any pending resets).
  • Refresh::ApprovalService#reset_approvals_for_merge_requests: calls coordinator.schedule_reset_approvals!, which marks the reset as pending internally and enqueues the worker immediately if no code owner sync is in progress, or defers it to be drained when the sync completes.
  • ResetApprovalsService: re-checks per-MR on entry. If a code-owner sync started between enqueue and execution, defers via coordinator.schedule_reset_approvals! and skips this MR (it'll be drained when the sync finishes). Calls finish_reset_approvals! after normal resets.
  • SyncCodeOwnerApprovalRulesWorker: routes via FF check alone — coordinator path when FF is on, legacy SyncCodeOwnerApprovalRules service when FF is off.

No artificial 10-second delay on the reset worker — the coordinator guarantees ordering.

Entry-check in the reset service

Between when a reset worker is enqueued (after the coordinator confirms no sync is in progress) and when Sidekiq actually executes it, a new code-owner sync can start. Without a check, the worker would run against the MR's half-built rules.

The service now re-checks coordinator.code_owner_sync_in_progress? per-MR before resetting. If a sync is in flight, the worker defers its own reset via coordinator.schedule_reset_approvals! and moves on to the next MR. The re-queued params get drained when the sync completes. Narrows the race window from "Sidekiq pickup delay" to "microseconds inside perform".

Reviewer feedback applied

  • Worker routes via FF check alone (no finish_sync param needed). Addresses !229090 (closed)#note_2823013495.
  • mark_reset_pending! folded into schedule_reset_approvals! as an implementation detail — callers no longer need to call it separately.
  • Legacy temporarily_unapprove! always set alongside coordinator keys for rollback safety (FF disable mid-flight still blocks merges via the legacy key).

Known limitations

The entry-check narrows but does not fully eliminate the race — a sync can still start between the check and the .delete_all that removes approvals. In practice this window is microseconds and the impact is bounded (rules only meaningfully change when CODEOWNERS itself is edited). Full mutual exclusion requires a per-MR serial-queue primitive; tracked as a follow-up.

Split from !229090 (closed). Relates to #373846.

Feature flag

deterministic_approval_rules_sync (introduced in !232648 (merged)).

Test plan

  • Updated specs for CreateService, ApprovalService, ResetApprovalsService, SyncCodeOwnerApprovalRules.
  • New spec case in ResetApprovalsService: when a code-owner sync is in progress, the reset is deferred via the coordinator and approvals are not modified.
  • Updated spec for SyncCodeOwnerApprovalRulesWorker verifying the FF-on path delegates to the coordinator.
  • Updated spec for MergeRequestResetApprovalsWorker verifying it executes regardless of sync state (coordinator handles ordering).
  • Manually verified on GDK: FF off reproduces the race condition (worker ignores concurrent sync); FF on blocks the reset worker until sync completes via the coordinator.
  • Live rails runner e2e tests against local GDK: schedule defers correctly when sync in progress; pending list drains on sync completion; sync_code_owners! ensure block drains automatically.

EE: true

Edited by Marc Shaw

Merge request reports

Loading