Enable users to validate and define outputs for jobs. This will help users ensure the expected outputs are generated further more, users would like to utilize outputs as inputs for subsequent jobs or jobs in downstream pipelines.
As a user I want to be able to define and validate outputs for jobs
So that I can ensure the expected outputs are generated and used effectively.
Additional information
Users would like to be able to define outputs for jobs. those outputs will be used to
Fail the job in case:
An artifact wasn't created
A variable wasn't created
An artifact has a different value than expected
A Varaible has a different value than expected
Use the job output as input for a subsequent job which could also be in a downstream pipeline
Jobs could reference the output of a previous job
Jobs should be able to evaluate the outputs at run time
WIP proposal
dast:script:-echo "status=$(./run-dast)" >> "$GITLAB_OUTPUTS"# you populate this variable with whatever values you want.-echo "report=$(cat report.json)" >> "$GITLAB_OUTPUTS"stage:testoutputs:# The runner parses the $GITLAB_OUTPUTS variable and maps the values to the output defined here.status:${{ script.outputs.status }}# expose the output with a given name (in this case it's the same name)summary:${{ script.outputs.report }}# expose the output with another name# The runner returns the parsed outputs to GitLab and we persist them for later use.# When we serialize data for a subsequent job we also include all outputs from the job's dependencies.# In this case we also send the outputs from `dast` job.security-notification:needs:[dast]script:-echo "${{ needs.dast.outputs.status }}"# the runner can do the interpolation at runtime, taking the value from the serialized data.-./notify "${{ needs.dast.outputs.summary }}"
Outputs from bridge jobs (parent-child or multi-project pipelines)
security-scans:stage:testtrigger:include:security-scans.yml# this runs several jobs: `dast`, `secret-detection`, etc.strategy:dependoutputs:dast_status:${{ dast.outputs.status }}# Once the bridge job completes we can collect outputs from jobs in the child pipeline.secret_detection_status:${{ secret_detection.outputs.status }}security-notification:needs:[security-scans]script:-echo "DAST result:${{ needs.security-scans.outputs.dast_status }}"-echo "Secret Detection result:${{ needs.security-scans.outputs.secret_detection_status }}"
Based on user research, the type of output could be:
Artifacts
Validate if the artifact exists
Validate its name
Validate its content
A variable
Validate that the variable was created
Validate the variable value
A string
Value of the string
Standard console logging
Execution time
Limits
As we introduce outputs we must add the following limits:
number of outputs that a single job can declare
size of a single output. Users should not be able to pass large data as outputs. Alternatively they can upload a file and set the path to the artifact as the output, so the dependent job can still fetch the artifact based on the value in the output.
If you are unsure about the correct group, please do not leave the issue without a group label, and refer to
GitLab's shared responsibility functionality guidelines
for more information on how to triage this kind of issue.
This issue was automatically tagged with the label grouppipeline authoring by TanukiStan, a machine learning classification model, with a probability of 0.86.
If this label is incorrect, please tag this issue with the correct group label as well as automation:ml wrong to help TanukiStan learn from its mistakes.
Outputs are completely optional, we have customers today that define output to their existing templates, so why shouldn't we provide a build-in experience for their existing experience? why do we need to wait/limit that experience for steps?
maybe we can limit the output type to artifacts only for now?
@fabiopitino We included questions about outputs in our Assignment 3 interviews. We heard that not having similar outputs (to what users have now) would be a blocker for adoption. So, it feels like we should make the process very easy to increase the chances that we win over users.
@dhershkovitch implementing outputs for something as generic as template.yml, where you can really have anything in it without structure, is hard to get it right and that fits everyone's need.
that is simply not structured, like it would be a single job. If a component adds multiple jobs to a pipeline that run at different time, what should be the output? If we consider that the component's configuration could be overridden by other includes or overridden locally, what should be the output?
Introducing more structured components (like job and steps) define a very specific scope, gives limitations to what you can do with that component but in this limitation there is the power of composition. Here we could introduce output, allowing another job to reference the output of a component-job or a step to use the output of a previous step.
@fabiopitino, we dont need to fit it to everyone's need, we do want to allow users to define output for structured components, it wont work for all cases and that's ok if it will have limitations, but it would be beneficial for our users to try it out as an experimental feature, do you believe this is something that technically feasible?
@dhershkovitch I'm not sure how this should work in practice. Let's say you have a component that adds multi projects to your pipeline. How should the output of that component work?
@fabiopitino, I assume not all components could produce an output, so this type won't be able to, what if we limit the outputs to artifacts only, so it will apply only to a component that contains a job that produce an output
Edit - maybe we should allow users to define output at the job level, and fail a job if the output was not created (#22711)
I assume not all components could produce an output, so this type won't be able to
From our perspective all components today are template type and unless we have complex logic to understand the shape of the component we can't prevent someone from defining or using outputs.
what if we limit the outputs to artifacts only, so it will apply only to a component that contains a job that produce an output
What should happen if the component contains multiple jobs producing artifacts? Are they all considered outputs?
It could be that outputs are just part of the job definition and not necessarily a component's feature.
Outputs from normal job
dast:script:-echo "status=$(./run-dast)" >> "$GITLAB_OUTPUTS"# you populate this variable with whatever values you want.-echo "report=$(cat report.json)" >> "$GITLAB_OUTPUTS"stage:testoutputs:# The runner parses the $GITLAB_OUTPUTS variable and maps the values to the output defined here.status:${{ script.outputs.status }}# expose the output with a given name (in this case it's the same name)summary:${{ script.outputs.report }}# expose the output with another name# The runner returns the parsed outputs to GitLab and we persist them for later use.# When we serialize data for a subsequent job we also include all outputs from the job's dependencies.# In this case we also send the outputs from `dast` job.security-notification:needs:[dast]script:-echo "${{ needs.dast.outputs.status }}"# the runner can do the interpolation at runtime, taking the value from the serialized data.-./notify "${{ needs.dast.outputs.summary }}"
Outputs from bridge jobs (parent-child or multi-project pipelines)
security-scans:stage:testtrigger:include:security-scans.yml# this runs several jobs: `dast`, `secret-detection`, etc.strategy:dependoutputs:dast_status:${{ dast.outputs.status }}# Once the bridge job completes we can collect outputs from jobs in the child pipeline.secret_detection_status:${{ secret_detection.outputs.status }}security-notification:needs:[security-scans]script:-echo "DAST result:${{ needs.security-scans.outputs.dast_status }}"-echo "Secret Detection result:${{ needs.security-scans.outputs.secret_detection_status }}"
@fabiopitino@dhershkovitch I'm struggling to understand how this relates to components/catalog exactly, in the sense of whether or not this is required to consider Components/Catalog GA. The way it's being explained makes this look like an improvement to CI/CD configuration in general, not specific to components. The examples above, for example, while interesting, could just as easily be part of someone's normal .gitlab-ci.yml config, and not necessarily part of a component. spec:inputs on the other hand is a fundamental part of components, for example.
It also seems like some of this is already possible with current config, at least for "normal" jobs. Thinking about the current description, artifacts are already fetched by later jobs by default, for example, and you can pass variables to other jobs with .env reports (and a string would essentially be just a variable anyways). With inputs, it seems like users could replicate a lot of this already. It feels like outputs would just be an easier way to do things in many cases, at least to start.
So could you explain a little more why this is necessary for the Components/Catalog GA effort?
@marcel.amirault I think outputs can exist regardless of components (the framework). What's important is that some constructs like jobs or steps can have their own implementation of outputs, while templates cannot.
Also how could we support different types of outputs (e.g. artifacts, or job logs?)
I think we don't need not replace artifacts with outputs. However, outputs can improve the experience working with artifacts:
sast:script:-echo "status=$(./run-sast)" >> $GITLAB_OUTPUTS-echo "report_file=sast-report.json" >> $GITLAB_OUTPUTSoutputs:status:${{ script.outputs.status }}report_file:${{ script.outputs.report_file }}security-notification:needs:[sast]# this downloads automatically any artifacts from "sast" job but rather than having to know what's the name# of the artifact file we could rely on the output to expose its file path. We rely on a contract rather than# implementation details.script:-echo cat ${{ deps.sast.outputs.report_file }}
As a user I want to define an expected job output and in case it doesn't match I want this job to fail
Feels a bit cumbersome that users would need to create a new job to validate output from previous jobs. I wonder if its better check and then fail the first job, maybe that's fine, but we'll need some validation, Also how would it work with multiple jobs and multiple outputs?
For the artifact scenario, could users check:
The existence of an artifact?
Content of an artifact?
Here is a related issue using a different solution which I wonder if it makes sense to implement #22711
@fabiopitino, one fundamental flaw with your suggestion is that the usage of needs: indicates the job execution order, however, users wouldn't like to mix that with the ability to define and use job outputs to subsequent jobs
As a user I want to define an expected job output and in case it doesn't match I want this job to fail
@dhershkovitch Oh, I thought you were referring to the scenario of a job failing based on the output of a previous job. The job setting the output also knows what the output value is and can pass or fail (with exit 1) based on that.
Also how would it work with multiple jobs and multiple outputs?
one fundamental flaw with your suggestion is that the usage of needs: indicates the job execution order, however, users wouldn't like to mix that with the ability to define and use job outputs to subsequent jobs
To answer to both: I initially used ${{ needs.dast.outputs.status }} but then I used a more generic ${{ deps.dast.outputs.status }}. The use of deps (short for dependencies) is more generic. Job dependencies can be defined as needs:, as dependencies: or via stages (by default). This would be more generic, whether you use needs or stages. If the job name is not in the list of dependencies, it should return an error.
Using deps.* you have access to multiple dependency jobs. With deps.<job-name>.outputs.* you have access to multiple named outputs on a single job.
# some other scan jobs to execute before...security-notification:script:-echo cat ${{ deps.secret-detection.outputs.status }}-echo cat ${{ deps.sectet-detection.outputs.report_file }}-echo cat ${{ deps.sast.outputs.status }}-echo cat ${{ deps.sast.outputs.report_file }}
I guess there are more then one scenario for outputs, but we need to make sure we design it in a way it could evolve to cover all of them
The job setting the output also knows what the output value is and can pass or fail (with exit 1) based on that.
Similar to #215100 users would like to define a job output and in case the output was not generated/does not match, job needs to fail e.g. artifact was not generated variable is empty (we can have more complex requirement such as artifact/variable contain a different value then expected, but lets not get ahead of ourself)
Another scenario as mentioned users would like to use the output of a job as input to another job - I believe this is the suggested syntax
In addition, users may want to use a job's output to determine whether subsequent jobs should run. I assume this is different from rules:, as the evaluation occurs at runtime rather than when the pipeline is created. (we've discussed in the past using ${{.. for run time evaluation at the runner vs $[[)
@fabiopitino Coming back to this, much of this still seems like something that's mostly already possible in our config, though you have to jump through some hoops (artifacts or .env files, for example). So is the idea to just make it easier/more straightforward to do?
@marcel.amirault Yes. Outputs are small data that can be passed to subsequent jobs. If you have a large amount of data you should still use artifacts but you could also use set the path of the artifact in an output if you want to so you can check whether the output has been set. It should also make it easier to pass data than using .env report.
With artifacts and .env files is that you need to know the path of the file upload by a different job or the name of the variable created by a different job. With outputs, which for components could be automatically documented, you depend on the output name without having to know any more internal details of other jobs.
@marcel.amirault Since we know that it'll be a blocker for adoption of CI Components if users aren't able to get similar outputs to what they have now, we should create some documentation around how to do that (once we decide on the approach here).
@enf Hrm, since everything that works right now should still work in components, I feel like users shouldn't be worried about that for components? outputs feels like a potentially easier way to configure things in general, and not really a requirement for components to be functional. I've asked about this above
@enf The design of the feature is still being discussed, and there's no feature in development yet, so there's nothing to document yet. Normally the engineers developing the feature will write the docs as part of that, and we'll merge all the docs and the code at the same time. Dov, Fabio and others usually ping me on proposals when they need third (or 4th, 5th) set of eyes on a proposal in case I have additional ideas or naming suggestions (or sometimes I just jump in without the ping ).
If you mean writing out the proposal/plan in a blueprint doc to share with others, that might be something @fabiopitino can do (and I can give it a review if needed, though blueprints usually don't get TW review as they are not product documentation).
Essentially, I mostly help with reviewing engineer-authored documentation for their "finished" features, or at least experimental/beta features that are fully testable by end users.
@fabiopitino@marcel.amirault based of our discussion I've converted this issue to define a job output, based if the user research @enf conducted we believe this would be useful enhancement, please add your comments
I already commented on my idea on another topic but I will add this here as well. I would like to see a global "context" where I can freely add environment variables from any job (except triggers) and pull them in any other job executed later, maybe making them available by default as environment variables in all subsequent jobs. The @dhershkovitch 's idea is good from my side except for the moment that it is supposed to explicitly set the job name where to pick the value from. I guess it could be better to collect named values globally. Users will give proper names themselves to distinguish values from different jobs. Same for artifacts.
I would also suggest adding another environment variable prefix CI_CTX_, automatically storing these variables in the global context on the job completion, and making them available in all jobs executed later. So in my pipeline bash code, I just do export CI_CTX_FOO=bar and this variable will be available in all subsequent jobs. The variable prefix will clearly explain that this variable is set in preceding jobs if it will be missing. The explicit way of defining is good, but if I need to share many variables with many jobs, my pipeline yaml is probably going to become a mess of variable definitions here and there and everywhere. And I can not use bash scripts to handle them in a one-liner expression in the yaml script - neither export them all at once, nor easily use in my scripts. Imagine this:
I guess it could be better to collect named values globally. Users will give proper names themselves to distinguish values from different jobs. Same for artifacts.
@alex.pravdin There will be challenges with composability if we follow this pattern. As log as you include templates that you create it may be ok. If you use components from a catalog and they define an output as variable named with the same name as a variable that already exists in the pipeline they can override each other. Even the CI_CTX_ prefix won't be sufficient to name conflicts.
Another design problem is that we will be adding more responsibilities to variables. We already have various types of variables:
GitLab's predefined variables
user defined variables as plaintext (e.g. in YAML)
user defined variables as secrets
user defined variables in dotenv reports
variables inherited from upstream
An analogy of using a global context to define variables would be like having methods in a programming language that communicate inputs and outputs via global variables. Global variables are considered an anti-patterns in almost all programming languages. With our proposal the outputs from a job are passed to the subsequent jobs. These latter jobs can decide whether and what outputs to use.
Can you describe what use cases you have that the current proposal is not solving?
From my current project, I need to configure the following (not everything is possible now) and share with multiple jobs based on branch name and project ci/cd environment variables:
Pipeline name
Deployment subdomain name
Deployment URL
Auto-delete interval
Version
Production and debug docker images
I want to define this in a single block or script to have a single place of responsibility and not spread it across jobs.
In the future, I would like to setup more configurations to make the deployment more flexible configuring more settings for the downstream project. And I want to do that in a single job and then reuse it in other places.
Why I'm worrying about explicit definitions, is because I'm working with terraform, and about 50% of TF code is variables definitions, assignments, and passes to submodules. It's a real headache. Will not be happy if having similar in GitLab. An explicit way is still better than nothing. But if it could be done in a more convenient way it will be awesome.
While you're right about global variables anti-pattern, gitlab ci/cd yaml is not a classical programming language It's a declarative schema with some scripts. From my point of view, global variables are totally fine in declarative tools. So I personally think that it's better to focus on the goal in solving this issue. But it's up to you of course what approach to choose.
@fabiopitino yes. Jobs provide outputs. Components provide inputs. So you get the same kind of composability as Steps. Should look like this (your example, but slightly modified to show steps):
sast:run:-name:run_saststep:./run-sast# outputs of step are `status` and `report`outputs:status:type:stringvalue:${{ steps.run_sast.outputs.status }}report_file:type:filevalue:${{ steps.run_sast.report_file }}security-notification:needs:[sast]# this is redundant and can be inferred from the expression belowrun:-script:echo cat ${{ jobs.sast.outputs.report_file }}
spec:outputs:status:type:stringreport_file:type:file# not implemented yet# (Usually this is an empty spec because we don't have Job inputs)---run:-name:run_saststep:./run-sast# outputs of step are `status` and `report`outputs:status:${{ steps.run_sast.outputs.status }}report_file:${{ steps.run_sast.report_file }}
And to get the outputs of the job, GitLab Runner just needs to call the FollowResults API which will include a single StepResult with outputs. The outputs expressions will already be evaluated by step-runner because it views a Job as just a single Step.
So most of this already "just works"! We would just need to define the input keyword on jobs and plumb it through. GitLab Runner just needs to do a little transformation to make the spec and outputs in the step. And then just retrieve the results.
We want to put StepResults back into the GitLab database (for many reasons). So this would be a good chance to plumb that through. GitLab can provide previous job outputs by reading that top-level result from the database for each previous job.