Parallel execution strategy for Merge Trains
Problem to solve
#9186 introduces the concept of merge trains, but for the MVC we are only running them sequentially. To really reap the benefits of merge trains, we can optimistically build refs and run the pipelines in parallel, resulting in very fast merge train execution for scenarios where most pipelines are likely to succeed.
Each MR that joins a merge train joins as the last item in the train, just as it works in the current state. However, instead of queuing and waiting, each item takes the completed state of the previous (pending) merge ref, adds its own changes, and starts the pipeline immediately in parallel under the assumption that everything is going to pass. In this way, if all the pipelines in the train merge successfully, no pipeline time is wasted either queuing or retrying. If the button is subsequently pressed in a different MR, instead of creating a new pipeline for the target branch, it creates a new pipeline targeting the merge result of the previous MR plus the target branch.
For example, if four MRs are queued together, this is what they would look like:
It's important to note that, given this composition, it's clear that they must not be allowed to merge out of order. Also, if any item fails, the train will need to be recalculated and restarted with the first non-failing item as the first in the train. For example, if MR2 (pipeline 2) failed (or is removed or canceled), all running pipelines for the merge train will be canceled as invalid, and a new one built containing:
In this scenario, MR1 will have already merged so it is no longer in play. MR2 is known to be broken, so is not added back to the merge train. MR2 could potentially resolve its issues, and queue back up as 3. MR3+MR4+MR2.
If the target branch is updated by someone directly, bypassing the merge train, all pipelines are recalculated, cancelled, and restarted. People should not be committing directly to the target branch if they are using merge trains, since it invalidates the whole thing. We don't block this in case of emergencies, but it is definitely an exception and not normal use case.
This strategy was selected to be implemented first because it is the most balanced optimization, and works by assuming that most of the time a green feature branch will merge without issue. In situations where the failure case is more common this will be less efficient than other strategies, and should not be used. We will consider adding more operation modes for release trains in the future to handle other kinds of merge train characteristics.
Permissions and Security
We need to control what the maximum merge train length is, and what the maximum parallelization is. For the MVC we will cap the maximum length of the merge train to 10 and a parallelization factor of 4. In future iterations these can be tuned or made configurable.
What does success look like, and how can we measure that?
What is the type of buyer?
Links / references
Other Potential Strategies
Generate All Outcomes
Another variation of Hybrid above would be to kick off pipelines for every possible combination of outcome scenarios in the merge train, for each pipeline. If there are 3 pipelines in the train, the 4th pipeline will start the following pipelines:
- 1+2+3+4 (in case all succeed)
- 1+3+4 (in case 2 fails)
- 1+4 (in case 2 and 3 fail)
- 4 (in case 1 2 and 3 fail)
- .. and so on.
You'd probably need some depth limit at some point, but this would guarantee a timely result for whatever scenario occurs. For certain scenarios where compute is less expensive than broken time in the target branch, this is the optimal approach.