Backend: Always process pipelines using DAG algorithm

Everyone can contribute. Help move this issue forward while earning points, leveling up and collecting rewards.

Close this issue

Problem

Today we have 2 ways of processing jobs in a pipeline: stage and DAG. To distinguish between the 2 types we use scheduling_type: :stage|:dag and based on that we would either consider the status of the previous stages or of the defined DAG needs: for a given job. This makes our pipeline processing having 2 strategies and likely 2 divergent behaviors.

A good example of why this is annoying is that you can't use DAG visualizations on a complex stages based pipeline, even if it would be interesting to see them that way. Instead we will have two tabs for looking at them, and only one of the tabs will work for any given pipeline, which is a bit sad. Over time there will be more and more of these, so lets address this split earlier than later.

Intended users

Devon (DevOps Engineer)

User experience goal

A UX goal here would be to make it easier for people to start taking advantage of needs because their pipelines are already needs behind the scenes. They wouldn't need to rethink the architecture or switch into a different mode (either in the product or in their way of thinking).

Proposal

Given that a stage-based pipeline is also a DAG-based pipeline with some syntactic sugar, we should be able to default to using a DAG processing for any pipeline.

We would at least need required:false as discussed at #30680 (comment 346587984). What other elements of the DAG are missing to properly model a stages based pipeline?

One way we could model stages is something like this, where stages (when defined) become special entities:

graph TD;
  build1 --> build_stage;
  build2 --> build_stage;
  build3 --> build_stage;
  build4 --> build_stage; 
  build_stage --> test12;
  build_stage --> test12;
  build_stage --> test34;
  build_stage --> test34;
  test12 --> deploy_stage;
  test34 --> deploy_stage;
  deploy_stage --> deploy_all;

This could interact with a pure-DAG set of jobs as follows; in the case of build5 it runs completely independently, and in the case of build6 it joins back up for the deployment:

graph TD;
  build5 --> test5;
  test5 --> deploy5;
  build6 --> test6;
  test6 --> deploy_all;
  build1 --> build_stage;
  build2 --> build_stage;
  build3 --> build_stage;
  build4 --> build_stage; 
  build_stage --> test12;
  build_stage --> test12;
  build_stage --> test34;
  build_stage --> test34;
  test12 --> deploy_stage;
  test34 --> deploy_stage;
  deploy_stage --> deploy_all;

One thing we would have to figure out is what to do with stages in a pure DAG, since we still allow them to be defined but they become something more like labels.

The most difficult part I think will be making it so that existing .gitlab-ci.ymls that don't use needs can be interpreted implicitly as having needs without them having to do anything.

Backend: Always process pipelines using DAG algorithm

Problem

Intended users

User experience goal

Proposal

Further details

Permissions and Security

Documentation

Availability & Testing

What does success look like, and how can we measure that?

What is the type of buyer?

Is this a cross-stage feature?

Links / references