The Needs keyword reduces cycle time, as it ignores stage ordering and runs jobs without waiting for others to complete, which speeds up your pipelines, previously needs could only be created between jobs to different stages (job depends on another job in a different stage), In this release, we've removed this limitation, so you can define a needs relationship between any job you desire, as a result, you can now create a complete CI/CD pipeline without using stages with implicit needs between jobs, so you can define less verbose pipeline which runs even faster.
Problem to solve
With the current implementation of the directed acyclic graph, the user has to help the scheduler a bit by defining stages for jobs, and only passing dependencies between stages. That can get complicated for large DAGs. Right now, users can deal with this by topologically sorting the DAG and greedily adding artificial “stage1”, “stage2”, etc. labels (or even one stage name per job).
Intended users
Individual contributor automators
Further details
Proposal
We will allow to depend on the jobs within the same stage instead of this being prevented by an error.
For now, we are not making stages only a "visualization hint" since they are still part of processing. In the future we are considering making all pipeline processing DAG (just, by default without needs set, it will behave just like a stage-based pipeline). At that point it may make sense to more broadly revisit what stages mean in GitLab CI.
If a job needs another in the same stage, dependencies should be respected and it should wait (within the stage) to run until the job it needs is done. Then, fetch its dependencies and run itself.
Limitations
Circular references will need to be detected and will still result in an error.
dependencies: will not be updated to support this (at least as part of this issue) separate from within the context of needs since non-DAG pipelines having dependencies on something in the same stage is undefined.
We don't yet have a plan to allow needs: to reference items in future stages. This is a more far out there case that could really become difficult to visualize, so we are not addressing it now.
Permissions and Security
Documentation
Testing
What does success look like, and how can we measure that?
Links / references
This page may contain information related to upcoming products, features and functionality.
It is important to note that the information presented is for informational purposes only, so please do not rely on the information for purchasing or planning purposes.
Just like with all projects, the items mentioned on the page are subject to change or delay, and the development, release, and timing of any products, features, or functionality remain at the sole discretion of GitLab Inc.
It's not that you can't merge yaml list items, GitLab just happens not to, at this point in time. It just uses the rails hash deep merge method, which can allow you to append.
Sounds like a good proposal to transition between staged pipelines and stageless pipelines which only use DAG..
The visualization should definitely show the graph if we use this approach, maybe just show the graph if the pipeline is fully stageless and use the current visualization for staged pipeline..
With DAG and explictly declared needs:, stages become only a visualisation hint. Useful, though, I would not abandon this concept.
If a job does not explicitly specify its needs:, it should depend fully on previous stage. This would be a syntactic shortcut to needs: [ all jobs of previous stage ]
If a job explicitly specifies empty needs: [], it may be assumed there are no dependencies, not even on previous stages, and job can run immediately.
Also, it would be nice to provide a way to explicitly refer to a stage in needs:. This enables use cases like needs: [ *FullBuildStage, SingleTestJob ]
Also, it would be nice to provide a way to explicitly refer to a stage in needs:. This enables use cases like needs: [ *FullBuildStage, SingleTestJob ]
Created #220758 for this, as I'm experiencing the same need.
As far as I understand from the description and discussions, we can go for two directions:
Allowing needs to include jobs in the same stage.
maybe just modifying Gitlab::Ci::YamlProcessor#validate_job_needs! and Gitlab::Ci::Pipeline::Seed::Build#needs_errors
consider: avoiding circular references.
consider: prevent needing builds of further stages.
Completely stage-less structure (I don't know how)
Likewise, we have dependencies. Do we have to consider dependencies in this issue? Can we have a conclusion that we allow needs to include same stage jobs, but dependencies keyword behaves same as before?
Besides backend thinking, I think we also need UX opinion about it. cc: @dimitrieh
Since nobody has been assigned for working on this in this milestone I'm assigning to the %12.9 release for now, but this will be evaluated among other priorities and anything else that misses %12.8 as we get closer to commitment for that release.
Due to missed items in %12.8 there is not enough remaining capacity in %12.9 for this item. Moving to %12.10 where, when we get closer to planning for that release, it will be reassessed in context of the rest of the priorities for that release.
It is very frustrating to have to define stages when I don't want them. In some cases, I want to define my job dependencies entirely with needs:, but I am forced to add stages: and stage:. needs: is almost useless to me because of this.
Any chance to get this in %13.0? According to #30632 (comment 297654503) this issue has priority to be worked on after %12.10 however the version has been released and this issue is still in %Backlog.
It's really painful to define two stages because you need to finish one job before another one - both refer to the same logical "stage". My pipeline UI overflows my screen size. :(
Changes to the CI system may risk breaking many important stuff as it may be difficult to maintain backwards compatibility. For instance, "needs" makes "stages" completely obsolete, but obviously cannot be removed.
A commercial premium customer is interested in the functionality: https://gitlab.my.salesforce.com/0016100001Eo81O
They have found the current implementation difficult to maintain. The relationships are forced on all jobs and work against the rules functionality.
A Silver customer (https://gitlab.my.salesforce.com/0014M00001h0rhj) is interested in this functionality, citing that currently this is slowing them down and "if this fix was deployed it would accelerate us greatly".
This feature would be really useful to us particularly for workflows where we want to use the .pre and .post stages.
This is particularly important for our use of an instance-wide required CI configuration where we don't know stage names of a pipeline in advance, so we use .post and .pre stages -- However, we have desired workflows of jobs that would depend on one another, which would not be possible without this feature.
They have scenarios where they have different tasks to be executed in different docker containers sequentially within a stage. If they want to leverage templating they need to be able to have job dependencies for defining the sequence.
For a deployment stage they may not want to put all steps into one job but make different sequential jobs which belong into the deployment stage.
Friendly ping @fabiopitino, or maybe @furkanayhan - the main thing I'm wondering about it how dependencies is impacted. The way I think this should work:
If a job needs another in the same stage, dependencies should be respected and it should wait (within the stage) to run until the job it needs is done. Then, fetch its dependencies and run itself.
So, if you had three jobs a->b->c all in the test stage. And c needs b needs a, then they would run serially (again, within the stage). This seems consistent and predictable but I really don't have a good sense of the back-end complexity.
Since GitLab 12.6, you can’t combine the dependencies keyword with needs to control artifact downloads in jobs. dependencies is still valid in jobs that do not use needs.
I've done a refinement pass based on latest conversation and thoughts around this issue. See diff to description above and let me know if you have any questions or concerns.
Just so I understand, this would allow you to use artifacts from jobs in the same stage (i.e. it would guarantee they are present)?
I was talking with a customer who was a bit hung up on this limitation earlier, specifically with the requirement that automatically stopping environments requires deploy_review and stop_review to be in the same stage. They do have a workaround, but this would be a nice improvement for them.
The current DAG has a failsafe mechanism built-in: stage: needs to be always defined. We can implement stage: nil (or empty) support and automatically generate a stage names to resolve dependencies in a order.
Then, you could have the whole pipeline do not use any stage: at all, just use needs:, and GitLab CI would generate some stage names for you. The names could be based on a some common names, or just have some other identifier that could be hidden.
Then, this change would really be a part of YAML processing, to ensure that stage is always set, and list of stages is properly configured ;)
Would that proposal still allow needs for jobs in the same stage? We are using stages but have to include .pre jobs to cache certain files for our deployment system (in case the ref gets deleted). An ideal solution for me would be to do this caching in our deploy stage and then have a dependency through needs in the same stage.
@ayufan@furkanayhan@dhershkovitch making stages option for needs is nice for sure if you happen to not care what the stage is called, and could be its own issue. As @chris579 mentions above, though, there's definitely also a use case for putting things together (in a stage called build, for example) that need to run in a certain order, but that you still want to show up associated with that stage.
A concrete example might be something like ETL.. you want them all in a stage called 'database' but then want a job called load to need a job called transform and that job to need one called extract. If the only way to do this is by giving them a random name, it wouldn't be that nice. We also have another issue that is going to allow people to set a dependency on a stage, instead of a job (#220758) and randomized or hidden names would break that for this use case.
Maybe there's a way internally to treat the stage as a kind of annotation hidden inside of processing, in cases of needs referring to the same stage, but then at the interaction layer it's treated as a stage. This probably just introduces complexity in some other way, though. Maybe another way would be to auto-modify the stage name for it to always be independent.
Maybe there's a way internally to treat the stage as a kind of annotation hidden inside of processing, in cases of needs referring to the same stage, but then at the interaction layer it's treated as a stage. This probably just introduces complexity in some other way, though. Maybe another way would be to auto-modify the stage name for it to always be independent.
This can be quite complex at this moment. We would need to convert all current stages-processing to be executed as DAG internally. Then stages would serve only as annotation, but not really be used for the data processing.
I'm good with making this issue work in a way something like the following:
If a job wants to need a job in the same stage, for example:
a:# stage: build # stage should not be specified along with a needs reference to the same stage, since it must be autogeneratedneeds:bimage:alpinescript:...b:stage:buildimage:alpinescript:...
Then, what would happen, is that a gets put into a new autogenerated stage called something like needs-b (we can come up with something better, maybe), and then the pipeline is processed as normal. We would show an error message if you specify needs and set stage to the same stage as what you need (with a message explaining why you might want to remove stage in this case.
This would also allow for interesting things like this, which would be a DAG that auto generates all stage names:
a:# stage: testscript:echo Hello Worldneeds:[]b:needs:ascript:echo Hello World# stage:# potential ones: a common prefix of all jobs that would land in this "stage" based on the similar `needs:`# potential ones: b-needs-a# potential ones: needs-astage:build# a fixed stagestage:# a dynamic stage, that is prefixed with `build <something>`prefix:build
Step 1 (MVC): Allow stage: nil (aka absence of stage:) and automatigcally generate names
Step 2: Allow stage: prefix: build to hint what stage names should be generated to make it convenient for users to annotate them
Step 3: Visually present all stages in a same bucket if they are in form of build::,
kind of similarly to how we group similar jobs (aka. parallel:)
@furkanayhan@ayufan and everyone I've created #251020 as a separate issue to implement the above. I hope it solves many of your use cases, but I know it doesn't solve all of them, so I'm keeping this one open as well (but removing it from the milestone) as it does not seem feasible to engineer this as our first iteration. I've put it in as a candidate for %13.7 so we revisit it soon, one way or the other. cc @dhershkovitch
Jobs that define no stage, but have needs could reasonably just run in the .post stage by default, I suppose, or a single stage generated after all stages defined by dependent jobs.
If all dependent jobs also define no stage (sounds much like stageless gitlab), then perhaps it should be the first stage (or .pre). Creating many stages seems like an improper approach and confusing if you expose that in UI/API.
Resolving the dependency and ordering is, fortunately, a mostly solved problem (look to any number of package managers and how they do this). In the UI, you can expose them like a "group" in that single stage and put them in sequential order.
Though, it sounds like a different issue than merely allowing needs to refer to a job in the same stage, which should be the focus of what to do first, in my opinion. Stageless gitlab sounds like a much larger kind of change :-)
@dhershkovitch just spoke to customer who has asked how likely the 13.10 candidacy is? Let us know if you would like to speak to them further on this requirement
@chloe in addition:
We have pretty complex pipelines with quite a number of jobs triggered in different conditions (push and/or merge requests, etc.) and would like to only express their order of execution by means of needs: (DAG), period.
Currently, we have to introduce meaningless intermediate stage:s (stage1, stage2, etc.) to circumvent the fact that needs: (DAG) can't refer to a job in the same stage. This brings more complexity in that, depending on the triggering conditions, some stages aren't applicable. Even given them a meaningful name got us nowhere (tests-that-can-start-now, tests-that-can-start-right-after-now, etc.).
@Thanathros Thank you for explaining that you can live without it, even though it won't help us get the needed attention to make it happen. As said above, stages bring more complexity to DAG-only pipelines.
@rdesgroppes I apologize if it sounded that way. I think, re-reading it now, that it sounded more negative than I wanted it to sound. I wanted to express my support for this.
Forgive me if this is not the right place, but does this Request satisfy my question. I am trying to set up a CI/CD pipeline such as the following:
QA --> STAGING --> DC1 --> DC2 ->ETC APP APP WEB WEB
basically, I want it to go horizontally, but then also sequentially down. I.e: do deploy to QA, then to staging, then to one datacenter...once it hits first datacenter, run 3 stages, one after the other. Once that's done and succeeded, move to the next datacenter, etc. I can't seem to attach a screenshot of what i mean, so forgive the poor ascii representation above
@dhershkovitch Customer https://gitlab.my.salesforce.com/00161000003RIGC is interested in this feature. They can try to use the workaround list in the issue for now, but that creates more maintenance in each of the jobs that needs to wait and makes them harder to reuse when they have such a close dependency. This would help make pipelines simpler by reducing the number of stages that need to be created to properly set dependencies with needs.
Since I'm currently trying to find a concept for our pipelines too and came across this issue.
Finding a more flexible job dependency system is also very interesting for us.
I want to mention that the famous systemd suite also manages its unit dependencies as a DAG. The crucial piece regarding this issue in that architecture are the "targets", which might be seen as the gitlab-ci stages: they are used for grouping units and can be referenced by units or other targets as dependencies the same way actual units can be. Therefore a unit can depend on another unit to be started successfully (or failed) as well as it can depend on a target to be reached.
systemd gives a lot of possibilities to structure and modify the DAG. Maybe it's worth looking at this concepts...
...just my 2 cents on this topic.
Thanks for all your good work.
@dhershkovitch ~~setting this back to workflowin dev as I'm done with the db issues 🔚~~ will set this back when I start working on it, other things came up.
@dhershkovitch - quick update on this. I've tackled the concerns from the POC and getting Furkan's opinion on them. I'll start getting Marcel involved in docs tomorrow.
Hi @lauraMon et al. I had some testing concerns when reviewing this issue. I see that in @f_caplette 's MR we're testing the that the number of path links between jobs in the same stage is correct, but we are not testing if they are connected to the appropriate job nodes. If we decide that we want to test that at the integration level or end to end, I think we'll need some way to identify what job nodes the path link is connecting. Currently the path link element html is:
<path d="M210,100.5L198,100.5C228,100.5,228,41.5,258,41.5" stroke-width="2" class="gl-fill-transparent gl-transition-duration-slow gl-transition-timing-function-ease gl-stroke-gray-200"></path>
as in this pipeline https://gitlab.com/gitlab-org/gitlab/-/pipelines/309785286.
Note that there is no identifying information in that path element to write automation against that can verify the path is linked to the correct job nodes. Alternatively we could do some visual comparison testing between what we expect the DAG to look like vs. what is rendered, but those kinds of tests are typically very fragile.
@ebanks👋🏻 I think that this is a particularly tricky one to test. If we try to use the SVG path values to assert the coordinates, I feel we can get down some pretty complex tests, though perhaps it would be worth it (I am still undecided). For example, knowing how SVG paths are generated, we could grab 2 nodes that form a link, say (50,50) (100, 50) which would be two nodes in subsequent column at the same height. Then try to check that the path value contains the origin and destination without asserting the in between (after all, because it's a bezier curve, we don't want to assert the entire path!). Perhaps this is warranted, but it would have to be done in an E2E test, meaning with a real browser running. In unittests, we cannot do these tests because we have to mock the DOM coordinates, so any tests on them is meaningless 😅
Alternatively we could do some visual comparison testing between what we expect the DAG to look like vs. what is rendered, but those kinds of tests are typically very fragile.
Agreed, I don't think this is what we need. We already have snapshots tests in the frontend to make sure the DOM doesn't change unexpectedly and I feel screenshots wouldn't help much more than that. The advantage of snapshots is that they can be updated really easily if they are no longer relevant, so they only act as a warning if a developer breaks them, they have to make sure it is intentional.
@f_caplette if we do decided to test these paths at the integration level or end to end level, then it seems to be that we should be able to add identifying information about which nodes are being connected in the path element. I mean, at some point we're deciding what nodes to connect with a path, can we just add that information to the path element at that point in the code, rather than just adding the positional coordinates?
@ebanks We could add an ID (or data-qa selector) to each path element that has the source and target has its value. So let's say that we generate a link between job1 and job2, then the link selector would be job1-job2. It would then be pretty easy to write a test that ensure that if job1 needs job2 we have a link name job1-job2.
please consider testing child pipelines too, the last gitlab update broke the pipeline view completely for pipeline runs that contains child pipelines: #333769 (closed)
I've omitted many of the details of our current Jenkins Pipeline for the sake of focusing on "Stage: Test". In this way, the concept of "stage" in GitLab CI becomes a human-meaningful indication of a collection of jobs, while the scheduling of those jobs is handled behind the scenes by the needs functionality. I think you'd want to expand the DAG like this within a stage such that it's more human readable than the solutions in !62032 (merged), but that may require additional rework on the back end to support more than one column of jobs within a stage.
Hey @dhershkovitch Because this changes a fundamental piece of the pipeline processing code, this is still being reviewed/looked at by maintainers. As the code cutoff is next week, I don't think we will be able to enable this FF by default by next week. I will mark this as Needs attention and update you again as soon as I hear back.
@dhershkovitch we will ship this in the next iteration. The testing of enabling the feature flag starts today, so the feature will certainly be enabled by default for self-managed by 14.2
@AndreKR That is correct - the MR is now merged into master, and the next steps are detailed in the rollout issue.
I think @weyert-tapico was asking how to opt-in as a Gitlab.com customer. The feature flag is called 'ci_same_stage_job_needs' but it was not obvious from the issue page how you can self-enable the FF yourself as a GitLab-dotcom customer.
@cunio_hector I think there's no way to enable it as a Gitlab.com customer. They have to enable it from their end, this issue is tracking that progress #328253 (closed)
It looks like it was enabled gradually higher-and-higher last week for a certain percentage of customers and then completely turned off bc issues were found. Then the rollout starts over again this week.
Just tested this and it works like a charm. 👍 One thing I noticed is that stage is still mandatory in a job that has needs. Issue #223686 (closed) was marked a duplicate of this one because according to #223686 (comment 414156013) it was planned to make stage optional. Is that still the case? I kind of expected the stage to be "test" by default, as it is with non-needs jobs.
@dhershkovitch - I'm closing this issue as the MR to enable this by default has been merged. stage will also now be optional - it's in Canary/GitLab Next and should soon be in production :)
needs: is similar to dependencies: in that it must use jobs from prior stages, meaning it's impossible to create circular dependencies. Depending on jobs in the current stage is not possible either, but an issue exists.
job1:script:-# script for job1when:manualjob2:needs:-job1script:-# script for job2
in this case when job1 done job2 will automatically run. however if i will re-trigger job1 it will not re-trigger 'job2' again (like the first time flow).
is it possible to re-trigger job2 when job1 done and succeed? (when job1 was re-triggered)
dependencies: will not be updated to support this (at least as part of this issue) separate from within the context of needs since non-DAG pipelines having dependencies on something in the same stage is undefined.
the plan is not to update dependencies separately to support referring to something in the same stage, without needs.
Is there a separate issue for supporting dependencies within the same stage without needs? I searched and looked through the linked issues and didn't find one.
For context, I'm trying to create a pipeline that allows manually triggering job B, using job A's artifacts, upon failure of job A, while still allowing failure of job A to create an overall pipeline status of ❌ instead of ❗. These are my findings from testing with https://gitlab.com/Codym48/pipeline-rules/-/blob/main/.gitlab-ci.yml:
I would expect dependencies without needs to work differently than this based on the documentation:
The job status does not matter. If a job fails or it’s a manual job that isn’t triggered, no error occurs.
The job status does seem to matter, since failures of a previous stage prevent the dependent stage from running.
And that fourth row, which looks like a silent failure (it respects the dependency, not allowing a manual trigger until the depended-on job finishes, but then it silently doesn't pull the artifacts, see screenshot below with no "Downloading artifacts" section), seems like a bug?
The documentation doesn't mention that dependencies must come from previous stages to allow downloading artifacts. It used to say this in needs,
needs: is similar to dependencies: in that it must use jobs from prior stages, meaning it's impossible to create circular dependencies. Depending on jobs in the current stage is not possible either, but support is planned.
but @marcel.amirault removed it in !68934 (merged) and I haven't found this limitation of dependencies: mentioned in any other documentation.
Can we update the documentation to mention this limitation of dependencies:, if it's real?
Is there a ticket, or can I open one, to remove this limitation and silent failure?
I found a way to do what I'm trying to do via allow_failure, CI_JOB_STATUS, artifacts:reports:dotenv and a watchdog job that forces failure:
But I still think the two questions remain, because dependencies, without the as-far-as-I-can-tell undocumented limitation mentioned above, could do this with way fewer lines of code and better pipeline readability.