[FF] `fix_merge_api_train_bypass` - Fix Accept MR API bypassing merge trains
Summary
This issue tracks the rollout of the fix for issue #593465 on production, currently behind the fix_merge_api_train_bypass feature flag.
The flag gates a correctness fix to PUT /projects/:id/merge_requests/:iid/merge:
- Before: When merge trains are enabled and the head pipeline has already succeeded, calling the endpoint with
auto_merge=truesilently bypasses the merge train and merges directly into the target branch — defeating the train's purpose. - After (flag on):
auto_merge=true+ train enabled → MR is added to the merge train via the preferred train strategy (merge_trainoradd_to_merge_train_when_checks_pass).auto_merge=false+ train enabled → returns422with a message pointing the caller atauto_merge=true, the merge trains API, orskip_merge_train=true.skip_merge_train=true→ preserved as the explicit immediate-merge escape hatch.
Owners
- Most appropriate Slack channel to reach out to:
#g_code_review - Best individual to reach out to: @marc_shaw
Expectations
What are we expecting to happen?
API clients (CI bots, glab, custom automation) calling the Accept MR endpoint on merge-train-enabled projects will:
- Correctly enqueue MRs onto the merge train when passing
auto_merge=true, instead of silently bypassing the train. - Receive a clear
422when attempting an immediate merge without explicitskip_merge_train=true, preventing the train from being silently bypassed by ambiguous calls.
What can go wrong and how would we detect it?
- Behavior change for existing clients on train-enabled projects. Clients that today call
auto_merge=trueand get an immediate merge will now get a train enqueue. Clients that today call withoutauto_mergeand get an immediate merge will now get a422(unless they passskip_merge_train=true).- Detection: increase in 422s on
PUT /merge_requests/:iid/mergefor train-enabled projects (logs / Kibana), customer reports.
- Detection: increase in 422s on
- Wrong strategy selected.
preferred_strategyreturns the first available; ifavailable_strategiesis empty for a train-enabled project we fall back to immediate merge ornot_allowed!. Should not differ from current behavior in that edge case.- Detection: Kibana logs for the endpoint.
Relevant dashboard: API error-rate panels on https://dashboards.gitlab.net/d/api.
Rollout Steps
Note: chatops commands run in the Slack channel impacted by the command.
Rollout on non-production environments
- Verify the MR with the feature flag is merged to
masterand has been deployed to non-production environments with/chatops gitlab run auto_deploy status <merge-commit-of-your-feature> - Deploy the feature flag at a percentage (recommended percentage: 50%) with
/chatops gitlab run feature set fix_merge_api_train_bypass 50 --actors --dev --pre --staging --staging-ref - Monitor that the error rates did not increase (repeat with a different percentage as necessary).
- Enable the feature globally on non-production environments with
/chatops gitlab run feature set fix_merge_api_train_bypass true --dev --pre --staging --staging-ref - Verify the feature works as expected against a staging project with merge trains enabled.
Specific rollout on production
- Ensure that the feature MRs have been deployed to both production and canary.
- Project-actor rollout to GitLab's own projects first:
/chatops gitlab run feature set --project=gitlab-org/gitlab,gitlab-org/gitlab-foss,gitlab-com/www-gitlab-com fix_merge_api_train_bypass true - Verify behavior on
gitlab-org/gitlab(which uses merge trains).
Preparation before global rollout
- Set a milestone to this rollout issue.
- Check if the feature flag change needs to be accompanied with a change management issue.
- Ensure that you or a representative in development can be available for at least 2 hours after feature flag updates in production.
- Update the REST API docs noting the new
422behavior on train-enabled projects.
Global rollout on production
- Incrementally roll out the feature on production.
- Example:
/chatops gitlab run feature set fix_merge_api_train_bypass <rollout-percentage> --actors. - Between every step wait for at least 15 minutes and monitor the appropriate graphs on https://dashboards.gitlab.net.
- Example:
- After the feature has been 100% enabled, wait for at least one day before releasing the feature.
Release the feature
- Create an MR to remove the
fix_merge_api_train_bypassfeature flag and the legacy code path. - Once the cleanup MR has been deployed to production, clean up the feature flag from all environments:
/chatops gitlab run feature delete fix_merge_api_train_bypass --dev --pre --staging --staging-ref --production - Close this rollout issue.
Rollback Steps
- Disable on production:
/chatops gitlab run feature set fix_merge_api_train_bypass false - Disable on non-production:
/chatops gitlab run feature set fix_merge_api_train_bypass false --dev --pre --staging --staging-ref - Delete from all environments:
/chatops gitlab run feature delete fix_merge_api_train_bypass --dev --pre --staging --staging-ref --production