Define CI Job Output

@dhershkovitch, Please add a group or category label to identify issue ownership.

You can refer to the Features by Group handbook page for guidance.

If you are unsure about the correct group, please do not leave the issue without a group label, and refer to GitLab's shared responsibility functionality guidelines for more information on how to triage this kind of issue.

This message was generated automatically. You're welcome to improve it.

changed title from Define output to any includable file to Define output to component

changed the description

This issue was automatically tagged with the label grouppipeline authoring by TanukiStan, a machine learning classification model, with a probability of 0.86.

If this label is incorrect, please tag this issue with the correct group label as well as automation:ml wrong to help TanukiStan learn from its mistakes.

If you are unsure about the correct group, please do not leave the issue without a group label. Please refer to GitLab's shared responsibility functionality guidelines for more information on how to triage this kind of issues.

Authors who do not have permission to update labels can leave the issue to be triaged by group leaders initially assigned by TanukiStan

This message was generated automatically. You're welcome to improve it.

added automation:ml grouppipeline authoring labels

added devopsverify sectionops labels

changed the description

@fabiopitino @grzesiek

@dhershkovitch IMO it makes sense to wait until Introducing steps (#357669 - closed) or Introduce job components (#396738). Using outputs with template components can be confusing. How do I get the output of a given component? How do I reference it?

The output of a job or the output of a step are easier to model.

The output of something generic as a template can be hard to define and use.

@fabiopitino

Outputs are completely optional, we have customers today that define output to their existing templates, so why shouldn't we provide a build-in experience for their existing experience? why do we need to wait/limit that experience for steps? maybe we can limit the output type to artifacts only for now?

@dhershkovitch can you provide more details on how customers define outputs in their existing templates today?

@fabiopitino We included questions about outputs in our Assignment 3 interviews. We heard that not having similar outputs (to what users have now) would be a blocker for adoption. So, it feels like we should make the process very easy to increase the chances that we win over users.

Interview notes and links to videos begin here. Look at Participant 2, Participant 7, Participant 5, Participant 12, Participant 4

That'll give you more info on what they need -- and a little bit on how they define them today.

@fabiopitino to the above also outlined in my Kubecon slide deck https://docs.google.com/presentation/d/1Ob1uY7pWC2iy1j_RIRr8JJO-ygKKaKJAv-Q7MYGtnqQ/edit#slide=id.g23909a49765_0_453 we should come up with a structured way to allow them to define output

@dhershkovitch implementing outputs for something as generic as template.yml, where you can really have anything in it without structure, is hard to get it right and that fits everyone's need.

A template could contain configurations like:

that is simply not structured, like it would be a single job. If a component adds multiple jobs to a pipeline that run at different time, what should be the output? If we consider that the component's configuration could be overridden by other includes or overridden locally, what should be the output?

Introducing more structured components (like job and steps) define a very specific scope, gives limitations to what you can do with that component but in this limitation there is the power of composition. Here we could introduce output, allowing another job to reference the output of a component-job or a step to use the output of a previous step.

@fabiopitino, we dont need to fit it to everyone's need, we do want to allow users to define output for structured components, it wont work for all cases and that's ok if it will have limitations, but it would be beneficial for our users to try it out as an experimental feature, do you believe this is something that technically feasible?

@dhershkovitch I'm not sure how this should work in practice. Let's say you have a component that adds multi projects to your pipeline. How should the output of that component work?

@fabiopitino, I assume not all components could produce an output, so this type won't be able to, what if we limit the outputs to artifacts only, so it will apply only to a component that contains a job that produce an output

Edit - maybe we should allow users to define output at the job level, and fail a job if the output was not created (#22711)

I assume not all components could produce an output, so this type won't be able to

From our perspective all components today are template type and unless we have complex logic to understand the shape of the component we can't prevent someone from defining or using outputs.

what if we limit the outputs to artifacts only, so it will apply only to a component that contains a job that produce an output

What should happen if the component contains multiple jobs producing artifacts? Are they all considered outputs?

Looking at how https://docs.github.com/en/actions/using-jobs/defining-outputs-for-jobs is done, it makes sense for outputs to be defined at job level since they represent a clear single unit that we can reference anywhere.

It could be that outputs are just part of the job definition and not necessarily a component's feature.

Outputs from normal job

dast:
  script:
    - echo "status=$(./run-dast)" >> "$GITLAB_OUTPUTS" # you populate this variable with whatever values you want.
    - echo "report=$(cat report.json)" >> "$GITLAB_OUTPUTS"
  stage: test
  outputs:                                 # The runner parses the $GITLAB_OUTPUTS variable and maps the values to the output defined here.
    status: ${{ script.outputs.status }}   # expose the output with a given name (in this case it's the same name)
    summary: ${{ script.outputs.report }}  # expose the output with another name

# The runner returns the parsed outputs to GitLab and we persist them for later use.
# When we serialize data for a subsequent job we also include all outputs from the job's dependencies.
# In this case we also send the outputs from `dast` job.
security-notification:
  needs: [dast]
  script:
    - echo "${{ needs.dast.outputs.status }}" # the runner can do the interpolation at runtime, taking the value from the serialized data.
    - ./notify "${{ needs.dast.outputs.summary }}"

Outputs from bridge jobs (parent-child or multi-project pipelines)

security-scans:
  stage: test
  trigger:
    include: security-scans.yml # this runs several jobs: `dast`, `secret-detection`, etc.
    strategy: depend
  outputs:
    dast_status: ${{ dast.outputs.status }} # Once the bridge job completes we can collect outputs from jobs in the child pipeline.
    secret_detection_status: ${{ secret_detection.outputs.status }}

security-notification:
  needs: [security-scans]
  script:
    - echo "DAST result: ${{ needs.security-scans.outputs.dast_status }}"
    - echo "Secret Detection result: ${{ needs.security-scans.outputs.secret_detection_status }}"

@fabiopitino @dhershkovitch I'm struggling to understand how this relates to components/catalog exactly, in the sense of whether or not this is required to consider Components/Catalog GA. The way it's being explained makes this look like an improvement to CI/CD configuration in general, not specific to components. The examples above, for example, while interesting, could just as easily be part of someone's normal .gitlab-ci.yml config, and not necessarily part of a component. spec:inputs on the other hand is a fundamental part of components, for example.

It also seems like some of this is already possible with current config, at least for "normal" jobs. Thinking about the current description, artifacts are already fetched by later jobs by default, for example, and you can pass variables to other jobs with .env reports (and a string would essentially be just a variable anyways). With inputs, it seems like users could replicate a lot of this already. It feels like outputs would just be an easier way to do things in many cases, at least to start.

So could you explain a little more why this is necessary for the Components/Catalog GA effort?

@marcel.amirault I think outputs can exist regardless of components (the framework). What's important is that some constructs like jobs or steps can have their own implementation of outputs, while templates cannot.

I've created !121013 (merged) to update the blueprint.

@fabiopitino

I guess there are several scenarios

As a user I want to define an expected job output and in case it doesn't match I want this job to fail
As a user I want to define a job output that could be used as an input to a subsequent job #235812

I assume the proposed syntax is for scenario number 2, right? can you suggest how to extend your suggestion that it would cover scenario number 1?

Also how could we support different types of outputs (e.g. artifacts, or job logs?)

@dhershkovitch outputs could be used for both scenarios:

As a user I want to define an expected job output and in case it doesn't match I want this job to fail

sast:
  script:
    - echo "status=$(./run-sast)" >> "$GITLAB_OUTPUTS" 

  outputs:  
    status: ${{ script.outputs.status }}
  artifacts:
    paths: [sast-report.json]

pre-deploy-checks:
  needs: [sast]
  script:
    - |
      if [ -z "${{ deps.sast.outputs.status }}" = "success" ];then
        ./pre-deploy-checks
      else
        echo "Check failed: SAST"
        exit 1
      fi

Also how could we support different types of outputs (e.g. artifacts, or job logs?)

I think we don't need not replace artifacts with outputs. However, outputs can improve the experience working with artifacts:

sast:
  script:
    - echo "status=$(./run-sast)" >> $GITLAB_OUTPUTS
    - echo "report_file=sast-report.json" >> $GITLAB_OUTPUTS
  outputs:  
    status: ${{ script.outputs.status }}
    report_file: ${{ script.outputs.report_file }}

security-notification:
  needs: [sast] # this downloads automatically any artifacts from "sast" job but rather than having to know what's the name
                # of the artifact file we could rely on the output to expose its file path. We rely on a contract rather than
                # implementation details.
  script:
    - echo cat ${{ deps.sast.outputs.report_file }}

@fabiopitino

As a user I want to define an expected job output and in case it doesn't match I want this job to fail

Feels a bit cumbersome that users would need to create a new job to validate output from previous jobs. I wonder if its better check and then fail the first job, maybe that's fine, but we'll need some validation, Also how would it work with multiple jobs and multiple outputs?

For the artifact scenario, could users check:

The existence of an artifact?
Content of an artifact?

Here is a related issue using a different solution which I wonder if it makes sense to implement #22711

Edit - another related issue #215100

Edit - another related issue #391657

@fabiopitino, one fundamental flaw with your suggestion is that the usage of needs: indicates the job execution order, however, users wouldn't like to mix that with the ability to define and use job outputs to subsequent jobs

As a user I want to define an expected job output and in case it doesn't match I want this job to fail

@dhershkovitch Oh, I thought you were referring to the scenario of a job failing based on the output of a previous job. The job setting the output also knows what the output value is and can pass or fail (with exit 1) based on that.

Also how would it work with multiple jobs and multiple outputs? one fundamental flaw with your suggestion is that the usage of needs: indicates the job execution order, however, users wouldn't like to mix that with the ability to define and use job outputs to subsequent jobs

To answer to both: I initially used ${{ needs.dast.outputs.status }} but then I used a more generic ${{ deps.dast.outputs.status }}. The use of deps (short for dependencies) is more generic. Job dependencies can be defined as needs:, as dependencies: or via stages (by default). This would be more generic, whether you use needs or stages. If the job name is not in the list of dependencies, it should return an error.

Using deps.* you have access to multiple dependency jobs. With deps.<job-name>.outputs.* you have access to multiple named outputs on a single job.

# some other scan jobs to execute before...

security-notification:
  script:
    - echo cat ${{ deps.secret-detection.outputs.status }}
    - echo cat ${{ deps.sectet-detection.outputs.report_file }}
    - echo cat ${{ deps.sast.outputs.status }}
    - echo cat ${{ deps.sast.outputs.report_file }}

@fabiopitino

I guess there are more then one scenario for outputs, but we need to make sure we design it in a way it could evolve to cover all of them

The job setting the output also knows what the output value is and can pass or fail (with exit 1) based on that.

Similar to #215100 users would like to define a job output and in case the output was not generated/does not match, job needs to fail e.g. artifact was not generated variable is empty (we can have more complex requirement such as artifact/variable contain a different value then expected, but lets not get ahead of ourself)
Another scenario as mentioned users would like to use the output of a job as input to another job - I believe this is the suggested syntax
In addition, users may want to use a job's output to determine whether subsequent jobs should run. I assume this is different from rules:, as the evaluation occurs at runtime rather than when the pipeline is created. (we've discussed in the past using ${{.. for run time evaluation at the runner vs $[[)

adding @gitlab-com/pipeline-authoring-group

Related issue #18921

@fabiopitino Coming back to this, much of this still seems like something that's mostly already possible in our config, though you have to jump through some hoops (artifacts or .env files, for example). So is the idea to just make it easier/more straightforward to do?

@marcel.amirault Yes. Outputs are small data that can be passed to subsequent jobs. If you have a large amount of data you should still use artifacts but you could also use set the path of the artifact in an output if you want to so you can check whether the output has been set. It should also make it easier to pass data than using .env report.

With artifacts and .env files is that you need to know the path of the file upload by a different job or the name of the variable created by a different job. With outputs, which for components could be automatically documented, you depend on the output name without having to know any more internal details of other jobs.

@fabiopitino OK great, now I understand the benefits, thanks!

added Category:Component Catalog label

Setting label(s) Category:Pipeline Composition based on grouppipeline authoring.

added Category:Pipeline Composition label

@marcel.amirault Since we know that it'll be a blocker for adoption of CI Components if users aren't able to get similar outputs to what they have now, we should create some documentation around how to do that (once we decide on the approach here).

@enf Hrm, since everything that works right now should still work in components, I feel like users shouldn't be worried about that for components? outputs feels like a potentially easier way to configure things in general, and not really a requirement for components to be functional. I've asked about this above

Great, now that you're tracking - may we start on the supporting documentation or set up a plan to get that in place @marcel.amirault?

@enf The design of the feature is still being discussed, and there's no feature in development yet, so there's nothing to document yet. Normally the engineers developing the feature will write the docs as part of that, and we'll merge all the docs and the code at the same time. Dov, Fabio and others usually ping me on proposals when they need third (or 4th, 5th) set of eyes on a proposal in case I have additional ideas or naming suggestions (or sometimes I just jump in without the ping ).

If you mean writing out the proposal/plan in a blueprint doc to share with others, that might be something @fabiopitino can do (and I can give it a review if needed, though blueprints usually don't get TW review as they are not product documentation).

Essentially, I mostly help with reviewing engineer-authored documentation for their "finished" features, or at least experimental/beta features that are fully testable by end users.

mentioned in merge request !121013 (merged)

changed title from Define output to component to Define Job Uutput

changed the description

changed title from Define Job Uutput to Define Job Output

@fabiopitino @marcel.amirault based of our discussion I've converted this issue to define a job output, based if the user research @enf conducted we believe this would be useful enhancement, please add your comments

mentioned in issue #235812

changed the description

marked this issue as related to #235812

marked this issue as related to #215100

marked this issue as related to #22711

changed the description

changed title from Define Job Output to Define CI Job Output

mentioned in issue #273382

mentioned in issue gl-retrospectives/verify-stage/pipeline-authoring#37 (closed)

mentioned in epic &7462

added sectionci label and removed sectionops label

changed the description

mentioned in merge request !120867 (merged)

I already commented on my idea on another topic but I will add this here as well. I would like to see a global "context" where I can freely add environment variables from any job (except triggers) and pull them in any other job executed later, maybe making them available by default as environment variables in all subsequent jobs. The @dhershkovitch 's idea is good from my side except for the moment that it is supposed to explicitly set the job name where to pick the value from. I guess it could be better to collect named values globally. Users will give proper names themselves to distinguish values from different jobs. Same for artifacts.

I would also suggest adding another environment variable prefix CI_CTX_, automatically storing these variables in the global context on the job completion, and making them available in all jobs executed later. So in my pipeline bash code, I just do export CI_CTX_FOO=bar and this variable will be available in all subsequent jobs. The variable prefix will clearly explain that this variable is set in preceding jobs if it will be missing. The explicit way of defining is good, but if I need to share many variables with many jobs, my pipeline yaml is probably going to become a mess of variable definitions here and there and everywhere. And I can not use bash scripts to handle them in a one-liner expression in the yaml script - neither export them all at once, nor easily use in my scripts. Imagine this:

create:
  outputs:
    VAR1: value1
    VAR2: value2
    VAR3: value3
    VAR4: value4
    VAR5: value5
    VAR6: value6
    VAR7: value7
    VAR8: value8
    VAR9: value9
    VAR10: value10
    VAR11: value11
    VAR12: value12

job1:
  needs: [create]
  script:
    - export VAR1=${{ needs.create.VAR1 }}
    - export VAR2=${{ needs.create.VAR2 }}
    - export VAR3=${{ needs.create.VAR3 }}
    - export VAR4=${{ needs.create.VAR4 }}
    - export VAR5=${{ needs.create.VAR5 }}
    - export VAR6=${{ needs.create.VAR6 }}
    - export VAR7=${{ needs.create.VAR7 }}
    - export VAR8=${{ needs.create.VAR8 }}
    - export VAR9=${{ needs.create.VAR9 }}
    - export VAR10=${{ needs.create.VAR10 }}
    - export VAR11=${{ needs.create.VAR11 }}
    - export VAR12=${{ needs.create.VAR12 }}
    - script_using_all_these_vars.sh

job2:
  needs: [create]
  script:
    - export VAR1=${{ needs.create.VAR1 }}
    - export VAR2=${{ needs.create.VAR2 }}
    - export VAR3=${{ needs.create.VAR3 }}
    - export VAR4=${{ needs.create.VAR4 }}
    - export VAR5=${{ needs.create.VAR5 }}
    - export VAR6=${{ needs.create.VAR6 }}
    - export VAR7=${{ needs.create.VAR7 }}
    - export VAR8=${{ needs.create.VAR8 }}
    - export VAR9=${{ needs.create.VAR9 }}
    - export VAR10=${{ needs.create.VAR10 }}
    - export VAR11=${{ needs.create.VAR11 }}
    - export VAR12=${{ needs.create.VAR12 }}
    - script_using_all_these_vars.sh

job3:
  needs: [create]
  script:
    - export VAR1=${{ needs.create.VAR1 }}
    - export VAR2=${{ needs.create.VAR2 }}
    - export VAR3=${{ needs.create.VAR3 }}
    - export VAR4=${{ needs.create.VAR4 }}
    - export VAR5=${{ needs.create.VAR5 }}
    - export VAR6=${{ needs.create.VAR6 }}
    - export VAR7=${{ needs.create.VAR7 }}
    - export VAR8=${{ needs.create.VAR8 }}
    - export VAR9=${{ needs.create.VAR9 }}
    - export VAR10=${{ needs.create.VAR10 }}
    - export VAR11=${{ needs.create.VAR11 }}
    - export VAR12=${{ needs.create.VAR12 }}
    - script_using_all_these_vars.sh

job4:
  needs: [create]
  script:
    - export VAR1=${{ needs.create.VAR1 }}
    - export VAR2=${{ needs.create.VAR2 }}
    - export VAR3=${{ needs.create.VAR3 }}
    - export VAR4=${{ needs.create.VAR4 }}
    - export VAR5=${{ needs.create.VAR5 }}
    - export VAR6=${{ needs.create.VAR6 }}
    - export VAR7=${{ needs.create.VAR7 }}
    - export VAR8=${{ needs.create.VAR8 }}
    - export VAR9=${{ needs.create.VAR9 }}
    - export VAR10=${{ needs.create.VAR10 }}
    - export VAR11=${{ needs.create.VAR11 }}
    - export VAR12=${{ needs.create.VAR12 }}
    - script_using_all_these_vars.sh

Instead of this, I would like to do something like that:

create:
  script:
    - . helper_functions.sh
    - export_all_vars_at_once_function

job1:
  script:
    - script_using_all_these_vars1.sh

job2:
  script:
    - script_using_all_these_vars2.sh

job3:
  script:
    - script_using_all_these_vars3.sh

job4:
  script:
    - script_using_all_these_vars4.sh

fyi @fabiopitino

I guess it could be better to collect named values globally. Users will give proper names themselves to distinguish values from different jobs. Same for artifacts.

@alex.pravdin There will be challenges with composability if we follow this pattern. As log as you include templates that you create it may be ok. If you use components from a catalog and they define an output as variable named with the same name as a variable that already exists in the pipeline they can override each other. Even the CI_CTX_ prefix won't be sufficient to name conflicts.

Another design problem is that we will be adding more responsibilities to variables. We already have various types of variables:

GitLab's predefined variables
user defined variables as plaintext (e.g. in YAML)
user defined variables as secrets
user defined variables in dotenv reports
variables inherited from upstream

An analogy of using a global context to define variables would be like having methods in a programming language that communicate inputs and outputs via global variables. Global variables are considered an anti-patterns in almost all programming languages. With our proposal the outputs from a job are passed to the subsequent jobs. These latter jobs can decide whether and what outputs to use.

Can you describe what use cases you have that the current proposal is not solving?

/cc @dhershkovitch

From my current project, I need to configure the following (not everything is possible now) and share with multiple jobs based on branch name and project ci/cd environment variables:

Pipeline name
Deployment subdomain name
Deployment URL
Auto-delete interval
Version
Production and debug docker images

I want to define this in a single block or script to have a single place of responsibility and not spread it across jobs.

In the future, I would like to setup more configurations to make the deployment more flexible configuring more settings for the downstream project. And I want to do that in a single job and then reuse it in other places.

Why I'm worrying about explicit definitions, is because I'm working with terraform, and about 50% of TF code is variables definitions, assignments, and passes to submodules. It's a real headache. Will not be happy if having similar in GitLab. An explicit way is still better than nothing. But if it could be done in a more convenient way it will be awesome.

While you're right about global variables anti-pattern, gitlab ci/cd yaml is not a classical programming language It's a declarative schema with some scripts. From my point of view, global variables are totally fine in declarative tools. So I personally think that it's better to focus on the goal in solving this issue. But it's up to you of course what approach to choose.

changed the description

mentioned in epic &10969 (closed)

changed milestone to %Backlog

mentioned in epic &10866

mentioned in issue #425902 (closed)

changed epic to &11566

mentioned in merge request !134759 (merged)

mentioned in issue step-runner#23 (closed)

mentioned in issue #412176

mentioned in issue #215100

mentioned in epic gitlab-org#7462

mentioned in issue #396738

@fabiopitino yes. Jobs provide outputs. Components provide inputs. So you get the same kind of composability as Steps. Should look like this (your example, but slightly modified to show steps):

sast:
  run:
    - name: run_sast
      step: ./run-sast # outputs of step are `status` and `report`
  outputs:
    status:
      type: string
      value: ${{ steps.run_sast.outputs.status }}
    report_file:
      type: file
      value: ${{ steps.run_sast.report_file }}

security-notification:
  needs: [sast] # this is redundant and can be inferred from the expression below
  run:
    - script: echo cat ${{ jobs.sast.outputs.report_file }}

When the sast job is give to step-runner it would be "wrapped" in a single step:

spec:
  outputs:
    status:
      type: string
    report_file:
      type: file # not implemented yet
# (Usually this is an empty spec because we don't have Job inputs)
---
run:
  - name: run_sast
    step: ./run-sast # outputs of step are `status` and `report`
outputs:
  status: ${{ steps.run_sast.outputs.status }}
  report_file: ${{ steps.run_sast.report_file }}

And to get the outputs of the job, GitLab Runner just needs to call the FollowResults API which will include a single StepResult with outputs. The outputs expressions will already be evaluated by step-runner because it views a Job as just a single Step.

So most of this already "just works"! We would just need to define the input keyword on jobs and plumb it through. GitLab Runner just needs to do a little transformation to make the spec and outputs in the step. And then just retrieve the results.

We want to put StepResults back into the GitLab database (for many reasons). So this would be a good chance to plumb that through. GitLab can provide previous job outputs by reading that top-level result from the database for each previous job.

cc @cam_swords

Define CI Job Output

Summary

Additional information

WIP proposal

Outputs from bridge jobs (parent-child or multi-project pipelines)

Limits

Related issues:

Designs

Child items ...

Activity

Outputs from normal job

Outputs from bridge jobs (parent-child or multi-project pipelines)

Define CI Job Output

Summary

Additional information

WIP proposal

Outputs from bridge jobs (parent-child or multi-project pipelines)

Limits

Related issues:

Relates to

Activity

Outputs from normal job

Outputs from bridge jobs (parent-child or multi-project pipelines)