In our CI file our deployment is a two stage deployment. If you set the job in the first stage to be manual then the second stage still triggers automatically despite the documentation stating:
"on_success - execute build only when all builds from prior stages succeed. This is the default."
Therefore I would expect the subsequent stages that default to on_success would only trigger once the manual job in the previous stage succeeds.
Designs
Child items
...
Show closed items
Linked items
0
Link issues together to show that they're related or that one is blocking others.
Learn more.
I had problem with defining that behaviour. The thing is that current description of stages doesn't allow us to better describe intended workflow.
Still, executing later stages may have sense, because for example you have important cleanups that should be executed always, not really making to stop pipeline.
Probably the solution here is to extend when: with more descriptive constrain, or just change on_success to be executed only when everything before succeeds.
Or maybe the solution is in defining when: for manual action. Currently manual doesn't stop pipeline, maybe we should have: block where we block pipeline execution.
That pipeline stop/block would be great. And that the play buttons for manual jobs appaer/enable after previous jobs are done (successfully) especially when manual jobs depend on some previous tests.
As discussed with @ayufan we can benefit from having ability to configure behavior of manual job. We can have some defaults like:
when:manual
which would behave just like it does right now, but we may want to configure manual job strategy as well
when:manual:strategy:block# orstrategy:retry
Using strategy: block would stop the pipeline, just like @norbnagya described it. Using strategy: retry would run jobs from subsequent stages, but when manual job was executed it would retry builds located in subsequent stages. I think both strategies may be valuable. /cc @markpundsack
@markpundsack I was thinking about manual job is optional, but may affect the state of subsequent builds.
Pipeline like this can be found here gitlab-qa!37 (builds). It is creating a docker image for subsequent builds in manual job, so if something has changed inside Dockerfile we can update image just by clicking play. If builds in later stages did fail because of missing libraries in image, running manual job then would retry builds. This is probably not the most important feature to have now, though.
!6604 (merged) should be considered a regression. This new behaviour has broken our builds. Subsequent stages should only run when all jobs in previous stages are successful. This is the documented behaviour and it fulfils the "principle of least surprise". My EE customer ZenDesk ticket: https://gitlab.zendesk.com/agent/tickets/52742
I agree with @harbottle. !6604 (merged) was implemented without considering the best default behavior, and worse, it changed the then default behavior. The addition of an option makes sense to me, but not the default implemented in !6604 (merged).
@dblessing Personally I do think that this should be the default behaviour because... Current behaviour is somehow weird to me. However if you look at the original issue #22598 (closed). Before the fix, manual actions would stop the subsequent stages only if it's in the first stage and only if there's only one manual job. You could probably argue that we should fix it the other way around, but that's not how it was designed in the first place.
Therefore I am not sure if we should change it again for the default behaviour, especially in patch release.
I would be caucious about saying that this is regression, it is more like implementation choice of supporting part of the story where you have non-blocking manual actions that can be executed later. The second part are blocking manual actions which can be considered as a full fledged feature. And we should basically finish second part of the story.
If we change default, we will basically break a lot of workflows that are supported now: stop environments to name the one :)
@ayufan@godfat If you consider a pipeline, it's a linear thing and it doesn't really make sense that the pipeline can continue if there is a manual action in the middle or at the beginning. Maybe not a regression per se, but we did break workflows. It was reported as a bug and we made the decision to change behavior. Now can we should add some configuration to make the behavior configurable?
Not trying to be pushy, but just to pick up on this:
it is more like implementation choice of supporting part of the story where you have non-blocking manual actions that can be executed later
My response to that is: why associate such actions with a stage at all then?
To reiterate: the change in behaviour contradicts the customer documentation and has broken our builds on EE, which were working. Personally, I don't care if it is reversed or if it is replaced with improved functionality, as I am happy to edit our .gitlab-ci.yml files. That said !6604 (merged) has only been merged for two months, so there will be more pipelines in the wild based on the old functionality than there will be that depend on this !6604 (merged).
If you consider a pipeline, it's a linear thing and it doesn't really make sense that the pipeline can continue if there is a manual action in the middle or at the beginning.
I fully agree with the above statement!
In our case, the introduction of !6604 (merged) made our pipeline more ambiguous. Our last step is a manual step to push to prod and it was very easy to see if the latest changes are in prod by simply checking if the pipeline succeeded (manual step was also triggered) or it is still in dev status (last manual step was not executed and the pipeline is in skipped state).
Now, it is ambiguous and we must get inside the pipeline to see if we pushed latest changes to prod or it is still in dev.
It also becomes harder to implement scenarios with multiple manual steps (with some automated steps in between) in the same pipeline, as automated steps between two manual steps will be executed regardless of manual steps, which is not the expected behavior.
@harbottle
I would say the complex and ugly workaround will be to use environment variables for next stages to be set in manual stages.
The problem with that approach will be complex and totally ambiguous pipeline.
Let's say you can set up an environment variable in manual stage and consecutive stages will run their scripts only if that variable is set, otherwise skip the scripts but exit with success. This is ugly, as the stage will be a success even though the scripts won't run in that stage. It will also introduce a lot of complexity to handle all the scenarios for such a pipeline.
In other pipelines it will be impossible, as you say.
In any case, manual steps should stop the pipeline, otherwise why would anyone put a manual step.
@stormbringerdp The issue you're describing (not knowing if there's a manual step to be executed) sounds like #23935 (closed) to me, which is supposed to be fixed now. (not sure by which merge request though)
@ayufan Manual actions should be blocking. There should be no other option. If you want a cleanup job to run regardless of a manual job running or not, then use when: always.
Stop jobs should be placed in their own stage, at the end of the pipeline. That's what I'm converging on for my examples (https://gitlab.com/markpundsack/openshift-example/blob/dry/.gitlab-ci.yml#L13). Nothing else makes sense. If that's a problem, we should take stop jobs out of the pipeline entirely (via smart deploys, for example).
Triggering a manual action should then enabled subsequent stages to run.
There might be some ambiguity about retrying a manual job, but how do we handle retries today? If I retry a normal automatic build, do we then trigger the rest of the pipeline again? If so, do the same for manual jobs. If not, then don't. :)
There is probably a different solution for !6604 (merged). In other issues, we resolved to decouple pipeline state from stage stage from job state. So it might be totally reasonable to have a pipeline status of success, while the only job in the pipeline is an un-triggered manual action. That doesn't mean the stage has to be considered success though. e.g.
I just checked, and retrying a previously successful build does not trigger subsequent builds to be run again, so we should not do that with manual actions either, even though I understand the potential use-case for doing so. It would be a cheap way to re-pull external dependencies. Perhaps we should solve that differently though. (e.g. https://gitlab.com/gitlab-org/gitlab-ce/issues/14728)
@markpundsack I think we still miss some concepts regarding manual jobs.
Currently, one of the most popular use cases is to use manual job to build docker image that is going to be used as a test image for jobs in subsequent sages. Docker image does not change so often, so we implemented this pattern in GDK and GitLab QA projects. We have this manual job in the first stage, and https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/6604 was needed to make it work that way.
Usually docker image does not need to change, so we skip this first manual job, and we run the tests that depend on the old image being present in the registry.
Sometimes we need to add some new features / dependencies to the test image. If change is required for tests to succeed, the workflow is:
push the commit
build image job is skipped
tests in later stages are being executed
all tests fail because of missing dependencies in the test image
execute manual job by clicking the play button
new image is built and pushed to the registry
failed test jobs can be retried and should succeed now
At this moment, in such case, we do retry failed jobs manually, but I would like to have ability to define the behavior of the manual job, like
build:image:script:-bin/docker build-bin/docker publishwhen:manual:policy:retry# orpolicy:blocking# or policy:skip# or something likeblocking:falsepolicy:retry
One important thing to mention here is that I'm not completely sure if this pattern is not merely a workaround for a performance / storage problems we now have. Using this leads to the possible discrepancy between the content ofDockerfile and actual image we use to run container we then execute tests in. We do that because building and pushing image on every Git push is inefficient, and consumes a lot of resources like build minutes, storage and CPU. It would be better to build image on every push, but it costs too much at this moment when image does not really change at all. The bottom line here is that I'm not completely convinced we can forget about these issues, especially when we are about to introduce build minutes.
One solution may be to improve caching when building a new docker image when Dockerfile did not change (and cut the build time to zero in this case) and to make it possible to replace the old image with a new one to avoid growing storage problem.
You should find that you can run this on every git push with only the small time penalty incurred as Docker pulls the exiting image layers from the registry. Docker will then build and push new layers only because it will have access to the layers from your existing image in its cache.
@developers
May I politely enquire when will we have working (i.e. blocking) manual tasks back to fix this regression? Our pipelines are still broken in our GitLab Enterprise Edition instance (Zendesk ticket 572742) due to this unexpected change in behaviour.