Backend: Always process pipelines using DAG algorithm
Everyone can contribute. Help move this issue forward while earning points, leveling up and collecting rewards.
Problem
Today we have 2 ways of processing jobs in a pipeline: stage and DAG. To distinguish between the 2 types we use scheduling_type: :stage|:dag and based on that we would either consider the status of the previous stages or of the defined DAG needs: for a given job. This makes our pipeline processing having 2 strategies and likely 2 divergent behaviors.
A good example of why this is annoying is that you can't use DAG visualizations on a complex stages based pipeline, even if it would be interesting to see them that way. Instead we will have two tabs for looking at them, and only one of the tabs will work for any given pipeline, which is a bit sad. Over time there will be more and more of these, so lets address this split earlier than later.
Intended users
User experience goal
A UX goal here would be to make it easier for people to start taking advantage of needs because their pipelines are already needs behind the scenes. They wouldn't need to rethink the architecture or switch into a different mode (either in the product or in their way of thinking).
Proposal
Given that a stage-based pipeline is also a DAG-based pipeline with some syntactic sugar, we should be able to default to using a DAG processing for any pipeline.
We would at least need required:false as discussed at #30680 (comment 346587984). What other elements of the DAG are missing to properly model a stages based pipeline?
One way we could model stages is something like this, where stages (when defined) become special entities:
graph TD;
build1 --> build_stage;
build2 --> build_stage;
build3 --> build_stage;
build4 --> build_stage;
build_stage --> test12;
build_stage --> test12;
build_stage --> test34;
build_stage --> test34;
test12 --> deploy_stage;
test34 --> deploy_stage;
deploy_stage --> deploy_all;
This could interact with a pure-DAG set of jobs as follows; in the case of build5 it runs completely independently, and in the case of build6 it joins back up for the deployment:
graph TD;
build5 --> test5;
test5 --> deploy5;
build6 --> test6;
test6 --> deploy_all;
build1 --> build_stage;
build2 --> build_stage;
build3 --> build_stage;
build4 --> build_stage;
build_stage --> test12;
build_stage --> test12;
build_stage --> test34;
build_stage --> test34;
test12 --> deploy_stage;
test34 --> deploy_stage;
deploy_stage --> deploy_all;
One thing we would have to figure out is what to do with stages in a pure DAG, since we still allow them to be defined but they become something more like labels.
The most difficult part I think will be making it so that existing .gitlab-ci.ymls that don't use needs can be interpreted implicitly as having needs without them having to do anything.
TBD - introduce minimal changes ensuring that:
- no changes are made to the YAML syntax
- no changes are made to the pipeline behavior
/cc @ayufan