Introduce job components

changed milestone to %Backlog

added Category:Component Catalog Category:Pipeline Composition UX auto updated devopsverify grouppipeline authoring sectionops typefeature labels

changed the description

mentioned in merge request !113952 (merged)

mentioned in merge request !113211 (merged)

mentioned in issue #398129 (closed)

mentioned in issue #27896

marked this issue as related to #27896

@fabiopitino I wonder if we should continue with includes instead of extends

Also, can we support calling a job from within a component?

api-scan:
  includes:
    - job: gitlab.com/gitlab-org/dast/api-scan@1.0 # job within a component


include:
  - component: gitlab.com/gitlab-org/dast@main # a component

@dhershkovitch Using include within a job can be ambiguous. What does it do differently than extends rather than where the content comes from? How does it work if you have both job:include and job:extends? Like below:

.base-job:
  interruptible: true
  
api-scan:
  include:
    - job: gitlab.com/gitlab-org/dast/api-scan@1.0
  extends:
    - .base-job

If api-scan job defines interruptible: which one takes precedence?

Using the same keyword to extend a job seems pretty clear:

.base-job:
  interruptible: true

api-scan:
  extends:
    - job: gitlab.com/gitlab-org/dast/api-scan@1.0  # job component
      inputs:
        website: http://example.com
    - .base-job  # explicit precedence. This takes precedence over the above.
  stage: test  # no need to use `stage` as input parameter
  # needs: [...]   # this can be specified instead of `stage`
  allow_failure: true   # any explicit attributes override the extensions.

include:
  - component: gitlab.com/gitlab-org/dast@main # a component

This would be very ambiguous for users since they don't know what's being included. Is it a job or a configuration?

The great advantage of the extends:job syntax is that the author of the .gitlab-ci.yml is responsible for arranging the jobs in the pipeline according to their needs.

You don't need to pass a stage parameter to tell the component where to add the job, neither the job name is defined in the component itself. You define a job name (already takes care of the naming conflict issue and the need to pass an prefix input). You define when the job should run (stage: or needs:).

What the job does in details comes from the job component. The latter is the part you should be delegating to the component to do its magic.

/cc @furkanayhan

I think we should not do it. It will be confusing once we add steps, and this already can be achieved via templated components and extends.

This would be valid idea if we would not have planned steps. The job component will not be very useful since: it requires a proper job specification, like needs and artefacts.

@ayufan Many users in our interviews do organize their templates as single jobs. Using template components is good when you want to add multiple jobs, change the pipeline shape, modify workflow:rules. When you want to add a single job it's not the best UX:

With include: all components are defined at the top of the config. It's not clear what the pipeline will look like if you add many components each adding a single job.
You will need to pass the stage as input parameter and perhaps other inputs to override job-level configurations, via include.
You will need to know the name of the job that will be added to the pipeline if you want to override other aspects (e.g. timeout, interruptible, etc.).

include:
  - component: gitlab.com/gitlab-org/dast@1.0
    inputs:
      stage: test
      website: http://example.com
      job_prefix: "my-prefix"

my-prefix-dast:
  allow_failure: true # I need to know the final job name if I want to override an attribute.
                      # To the reader it's not clear where `my-prefix-dast` is defined. This could
                      # be hundreds lines down the .gitlab-ci.yml file.

Using job components would provide a much better UX:

You have a clear picture of what the pipeline looks like since you mainly define its structure.
You define the job name, stage, needs, etc. since these are pipeline-specific (not component-specific) configurations.
I can define/override anything else in line with the use of the component.

my-prefix-dast:
  stage: test # this is pipeline specific config
  extends:
    - component: gitlab.com/gitlab-org/dast@1.0
      inputs:
        website: http://example.com # this is the true component-specific input!
  allow_failure: true
  needs: [build, other-dependency]  # this is pipeline-specific config

Job components can still be composed of step components.

I think there is value in doing it and I believe it might not be complicated implementing it. We can explore steps first since it may cover some use cases with jobs composed of multiple steps or composite steps (made of other steps).

@fabiopitino

When you want to add a single job it's not the best UX

Yes, because they don't have alternative.

If you would have steps that can be composed of other steps, configured with inputs, the relevance of job components is dismissed.

Then it gets confusing when I should use template, job vs step.

@ayufan with job components you can have job-level keywords that are predefined to work well in the majority of the cases. E.g. You can have job:variables, release, interruptible, cache, parallel, etc. define with defaults that work out of the box.

With job components you get the advantage of defining components that are small enough that you can very easily override locally and decide where to run in the pipeline (stage, needs), but at the same time big enough that you don't need to fill too much details all the time (e.g. exporting artifacts, rules, variables passing, stage/needs, etc.).

With steps we can control the job execution (all the *script* keywords, image and services) but we cannot control aspects that are at a higher level of abstraction for a job (e.g. stage, needs, dependencies).

A job component can also be a bridge job that triggers a child pipeline. So you can hide a lot of complexity and isolation behind a single job:

compliance:
  stage: test
  extends:
    - component: gitlab.com/gitlab-org/compliance-pipeline@1.0 # makes it a bridge job

While I'm supportive of the template type of components, they serve their purpose so that we allow existing customers to move their templates to the CI catalog and ease the migration, while developing a framework for sharing and contributing to components in general.

In the long term I think using templates only (for higher level pipeline composition) will make the pipeline structure opaque and a frustrating experience. A typical example is our current gitlab-org/gitlab pipeline: https://gitlab.com/gitlab-org/gitlab/-/blob/cceddf7cfeff1926d5a9eb760d98103bca0dcffd/.gitlab-ci.yml#L205. You have no idea of what the pipeline structure look like at high level. You need to check each file, understand what it does and eventually you can start to understand were certain jobs come from.

/cc @dhershkovitch

With steps we can control the job execution (all the *script* keywords, image and services) but we cannot control aspects that are at a higher level of abstraction for a job (e.g. stage, needs, dependencies).

Correct, but also things like stage, needs, and dependencies are already local pipeline specific. It does not make sense to outsource this to job specification, and having a different syntax for such.

You can have job:variables, release, interruptible, cache, parallel, etc. define with defaults that work out of the box.

The job:variables in most part is replaced by spec:inputs.
The release:, interruptible:, cache: are to be replaced by dedicated step:

The parallel: is something to consider how to handle.

In the long term I think using templates only (for higher level pipeline composition) will make the pipeline structure opaque and a frustrating experience.

Yes, this is why we should push people to use templates for parent-child pipelines, not for incline includes.

As for the rest this still holds true: we think that we need jobs. This is because we don't have steps. The 90% of cases you do mention are to be solved by steps in significantly better way.

If we introduce 3 or 4 types of components it will be super confusing.

@fabiopitino

Lets think about this problem from the other side:

we focus to get steps now
for all use-cases that we might consider to introduce jobs component, what would be equivalent for steps?
then we could evaluate if jobs are truly needed component
and then consider what would we have to add to steps to not need jobs

@ayufan - What is "steps"? :) Do you have a link to a relevant epic or docs where I can read more?

@MalteMagnussen https://docs.gitlab.com/ee/architecture/blueprints/gitlab_steps/

mentioned in issue #410087

mentioned in epic &7462

mentioned in epic &10969 (closed)

changed epic to &11566

mentioned in epic gitlab-org#7462

added sectionci label and removed sectionops label

@josephburnett I mentioned this today in our call. This proposal is for users that want to create/use components that contain a single job (script execution + job metadata), since templates in general can contain multiple jobs.

I think CI Steps would already solve part of this today since you can compose steps with steps and you can abstract away all the script part. This is just an idea that we need to see if it would still makes sense after having Steps.

@fabiopitino thanks for the pointer! I think we should also consider job outputs. And how to move them around.

@josephburnett Oh job outputs! Another pointer then: Define CI Job Output (#410087)

Introduce job components

Problem

Proposal

Designs

Child items ...

Activity

Introduce job components

Problem

Proposal

Relates to

Activity