Fix PG::UniqueViolation and NoMethodError in merge train services
What does this MR do and why?
This merge request fixes two critical vulnerabilities in the GitLab merge train services that cause production issues:
-
PG::UniqueViolation on
index_merge_trains_on_merge_request_id- Race condition causing duplicate key constraint violations inAutoMerge::MergeTrainServicewhen multiple concurrent requests attempt to create merge train cars. -
NoMethodError for undefined
shamethod - Nil reference error when accessingdiff_head_pipeline.shainAutoMerge::AddToMergeTrainWhenChecksPassServicewhen the pipeline is missing or deleted.
See: Fix race condition to ensure concurrent MRs suc... (#584920)
Implementation Details
Fix 1: Atomic Car Creation with Exception Handling
Modified ee/app/services/auto_merge/merge_train_service.rb to use atomic create_merge_train_car with exception handling:
- Changed from non-atomic
build_merge_train_car()to atomiccreate_merge_train_car() - Added
rescue ActiveRecord::RecordNotUniqueto handle race conditions gracefully - Reload merge request to get existing car after race condition
- Return early if car is nil after reload
Fix 2: Safe Navigation Operator for Nil-Safe Access
Modified ee/app/services/auto_merge/add_to_merge_train_when_checks_pass_service.rb to use safe navigation operator:
- Changed from
merge_request.diff_head_pipeline.shatomerge_request.diff_head_pipeline&.sha - Allows
pipeline_shato be nil when pipeline is missing - SystemNoteService handles nil SHA gracefully
Benefits
- Prevents duplicate key violations and NoMethodError exceptions
- Handles concurrent requests and missing pipelines gracefully
- Maintains backward compatibility with no API or database schema changes
- Follows existing patterns in the codebase
- Comprehensive test coverage added for both race condition and nil pipeline scenarios
References
- Merge trains can get stuck on unexpected errors (#496664 - closed)
- Auto cancel of a Child Pipeline in a Merge Trai... (#517148)
- Merge request stuck in locked state when gettin... (#389044)
- Backend: Allow Merge train to merge with Skippe... (#55253)
How to set up and validate locally
- No database migrations needed
- No feature flags needed
- No configuration changes needed
- Can be deployed immediately
- Monitor error logs for any issues post-deployment
- Monitor merge train success rates post-deployment
MR acceptance checklist
Evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.
Click to expand
Quality
- Self-reviewed this MR per code review guidelines
- Code follows software design guidelines - Uses existing patterns for atomic operations and safe navigation
- Automated tests exist - Added comprehensive test coverage for race condition and nil pipeline scenarios
- Considered technical impacts on GitLab.com, Dedicated and self-managed - No schema changes, backward compatible
- Considered impact on frontend, backend, and database - Backend only, no database schema changes
- Tested in all supported browsers - Not applicable (backend service changes only)
- Confirmed backward compatibility - No API or database schema changes, fully backward compatible
-
Properly separated EE content - Changes are in
ee/app/services/directory - Considered existing data variation - No data model changes, handles nil gracefully
- Fixed flaky tests - No flaky tests introduced
Performance, reliability, and availability
- Confident this MR does not harm performance - Atomic operations improve performance by reducing race conditions
- Added information for database reviewers - No database changes required
- Considered availability and reliability risks - Fixes critical production issues, improves reliability
- Considered scalability risk - No scalability concerns, fixes race condition that impacts scalability
- Considered performance/reliability/availability impacts on large customers - Fixes issues affecting all customers
- Considered impacts on minimum system requirements - No impact on system requirements
Observability instrumentation
- Included enough instrumentation - Existing error handling and logging in services captures issues
Documentation
- Included changelog trailers - Changelog entry should be added for bug fixes
- Added/updated documentation - No documentation changes needed for internal service fixes
Security
- Confirmed credential/token/authorization/authentication handling - Not applicable, no security-sensitive changes
- Reviewed security guidelines - No security review needed for this bug fix
- Addressed security scan results - No security concerns identified
Deployment
- Considered using a feature flag - Not needed for bug fix, can be deployed immediately
- If using feature flag, planned staging/production testing - N/A
- Informed Infrastructure department - No infrastructure changes needed
Compliance
-
Confirmed correct MR type label - Should be labeled as
~bugor~bugfix
Edited by Ket Slaats