Possibility to enforce the execution order of jobs using resource_group

Problem to solve

resource_groups from #15536 (closed) currently don't give any guarantee about the sequence order in which jobs work with the same resource_group. The only guarantee is, that they can't work on it at the same time. It's first-come-first-served, allowing jobs from older pipelines to run after newer ones have finished.

Alternative solution

We provided an alternative solution for this issue in https://docs.gitlab.com/ee/ci/yaml/#pipeline-level-concurrency-control-with-cross-projectparent-child-pipelines

Intended users

Further details

Only allowing forward deployments, as intended with #25276 (closed) might not be suitable for all possible use cases.

E.g. when only or except is used, two pipelines might want to do different jobs to the same resource_group depending on what was changed.

Or when there are other jobs only present in the older pipeline, they would be skipped and be missed otherwise.

See #15536 (comment 281839557) for more detailed use cases.

Proposal

Process Flow (Today)

When a job is about to be pending or has finished:
- The system picks the first job from the list of waiting_for_resource status jobs and attempts to allocate a resource for it.
  - If the allocation succeeded, the job transitions to pending.
  - If the allocation failed, the job stays waiting_for_resource status.

Process Flow (Tomorrow)

When a job is about to be pending or has finished:
- The system picks the first job from the list of waiting_for_resource or created status jobs and attempts to allocate a resource for it.
  - If the allocation succeeded, the job transitions to pending.
  - If the allocation failed, the job stays waiting_for_resource or created status.
  - If the picked job is created status, the allocation definitely fails.
- The process order can be specified per resource group
  - If the process order is asc_by_pipeline_id, the list is sorted by pipeline ID in ascending order.
  - If the process order is desc_by_pipeline_id, the list is sorted by pipeline ID in descending order.
- The process scope can be specified per resource group
  - If the process scope is waiting_for_resource, the list includes waiting_for_resource status jobs. (i.e. Current behavior)
  - If the process order is created_or_waiting_for_resource, the list includes created or waiting_for_resource status jobs.

Examples

Example 1: Simple pipeline

Given that user created three pipelines at once:

Pipeline 3: build -> test -> deploy 3
Pipeline 2: build -> test -> deploy 2
Pipeline 1: build -> test -> deploy 1

and deploy job requires a resource from a resource group.

When the resource group property is `process_scope: waiting_for_resource` and `process_order: asc_by_pipeline_id`.

The deployment jobs are executed in a random order (because we don't know which deploy job starts running at first)

When the resource group property is `process_scope: created_or_waiting_for_resource` and `process_order: asc_by_pipeline_id`.

The deployment jobs are executed in the deploy 1 => deploy 2 => deploy 3 order. Even if the deploy 2 job tries to run at first, the system picks a deploy 1 as the allocation candidate (which is not ready for execution), so the deploy 2 job stays waiting_for_resource status.

When the resource group property is `process_scope: created_or_waiting_for_resource` and `process_order: desc_by_pipeline_id`.

The deployment jobs are executed in the deploy 3 => deploy 2 => deploy 1 order. Even if the deploy 2 job tries to run at first, the system picks a deploy 3 as the allocation candidate (which is not ready for execution), so the deploy 2 job stays waiting_for_resource status.

Example 2: The same resource group is used in multiple jobs in the same pipeline

Given that user created three pipelines at once:

Pipeline 3: build -> test -> deploy 5,6
Pipeline 2: build -> test -> deploy 3,4
Pipeline 1: build -> test -> deploy 1,2

and deploy job requires a resource from a resource group.

When the resource group property is `process_scope: waiting_for_resource` and `process_order: asc_by_pipeline_id`.

The deployment jobs are executed in a random order.

When the resource group property is `process_scope: created_or_waiting_for_resource` and `process_order: asc_by_pipeline_id`.

The deployment jobs are executed in the following order:

deploy 1 => deploy 2 OR deploy 2 => deploy 1 (The inner order is not guaranteed)
deploy 3 => deploy 4 OR deploy 4 => deploy 3 (The inner order is not guaranteed)
deploy 5 => deploy 6 OR deploy 6 => deploy 5 (The inner order is not guaranteed)

When the resource group property is `process_scope: created_or_waiting_for_resource` and `process_order: desc_by_pipeline_id`.

The deployment jobs are executed in the following order:

deploy 5 => deploy 6 OR deploy 6 => deploy 5 (The inner order is not guaranteed)
deploy 3 => deploy 4 OR deploy 4 => deploy 3 (The inner order is not guaranteed)
deploy 1 => deploy 2 OR deploy 2 => deploy 1 (The inner order is not guaranteed)

Example 3: Similar to Example 2 but deploy jobs are separate across different stages

Given that user created three pipelines at once:

Pipeline 3: deploy 5 -> build -> test -> deploy 6
Pipeline 2: deploy 3 -> build -> test -> deploy 4
Pipeline 1: deploy 1 -> build -> test -> deploy 2

and deploy job requires a resource from a resource group.

When the resource group property is `process_scope: waiting_for_resource` and `process_order: asc_by_pipeline_id`.

The deployment jobs are executed in a random order.

When the resource group property is `process_scope: created_or_waiting_for_resource` and `process_order: asc_by_pipeline_id`.

The deployment jobs are executed in the following order:

deploy 1 => deploy 2
deploy 3 => deploy 4
deploy 5 => deploy 6

When the resource group property is `process_scope: created_or_waiting_for_resource` and `process_order: desc_by_pipeline_id`.

The deployment jobs are executed in the following order:

deploy 5 => deploy 6
deploy 3 => deploy 4
deploy 1 => deploy 2

Interface

Users can change the properies of the resource group via Public v4 API. Likely we introduce the following paths:

PUT projects/:project_id/resource_groups/:key
- options: process_mode is one of unordered, oldest_first or newest_first
The default process mode is unordered.

(UI or .gitlab-ci.yml support is future iteration)

Previous proposal

Proposal

Introducing an option in the .gitlab-ci.yml to configure when the lock for a resource_group is obtained. Obtaining it means, that a waiting queue is maintained for each resource_group which determines who is allowed to work with it next (a queueing mechanism should already exist for resource_group as part of the semaphore primitive implemented).

Currently the lock is obtained and waited for right before the job starts, which ensures that no jobs run at the same time for the same resource. But when a job is ready to start (and is therefore requesting a lock), depends on the non-deterministic runtimes of previous jobs and the concurrent sheduling of jobs in general.

An alternative option that this feature should introduce is to obtain the lock when the job is created (which happens when the pipeline is created). This means that jobs wanting to work on the same resource_group are forced to work in the order in that they are created. This ensures that older jobs are always running before newer jobs for one particular resource_group. The downside of this is that the jobs might have to wait longer for a resource to become free in some situations. Jobs for other resource_groups or jobs without them, should not be affected by this and will still benefit from concurrent execution.

Sample Configuration

Assumtion: Allowing resource_group to be optionally defined as an dictionary, using a property named resource to represent the string value from the non-dictionary mode.

This feature shall add a new property named lock_at (or obtain_at, obtain_lock_at, *_on, etc.) with the two possible values create and start.

The default value for this setting should in my opinion be create. It should result in having less issues due to race conditions and wrongly ordered deployments for users that are unaware of possible problems. But this is debatable, the current behavior for resource_group would be that of start. I can understand, that you'd not want to change it now that resource_group is released.

# pipeline-level
resource_group:
  resource: "Resource 1"
  lock_at: create

job_a:
  script: "echo a"
  resource_group:
    resource: 'Resource 2'
    lock_at: create

job_b:
  script: "echo b"
  resource_group:
    resource: 'Resource 2'
    lock_at: start
    # I don't know if mixing this setting for the same resource is useful for anything, but we shouldn't restrict it.

job_c:
  script: "echo c"
  resource_group:
    resource: 'Resource 3'
    lock_at: start
    # But I can think of situations where mixing it in the same pipeline for diferent resources might be useful.

A create lock for jobs should happen after the create lock on the pipeline-level. Pipeline start can only happen after the creation of the pipeline and all the jobs for it. A job at the first stage that has a start value should lock the resource directly after the start lock of the pipeline.

Documentation

Extend the documentation for resource_group here: GitLab CI/CD Pipeline Configuration Reference

Availability & Testing

What risks does this change pose to our availability? It might make pipeline jobs wait longer for resources to become available, if the default is changed or the new option is used manually.

How might it affect the quality of the product?

It should improve the reliability of concurrent pipelines that access shared resource_groups
Users that define their CI/CD Pipeline Configuration have a new option to fine-tune it for their use-case.

What additional test coverage or changes to tests will be needed?
It should be tested that there are no dead locks introduced by this feature. If I'm not mistaken, there shouldn't be any for this change in theory. But depending on how it's implemented and how Gitlab exactly works in the background, this might be possible?

Will it require cross-browser testing?
No

What does success look like, and how can we measure that?

Success

Users are using the new feature, have fast concurrent pipelines with deployments they can rely on.
Users not aware of the possible risks of execution order should have a safer default when using resource_group or environment (which will use resource_group implicit in the future, and should be added by #199048).

Measure
.gitlab-ci.yml files in public projects counted for how often the new feature is used.

What is the type of buyer?

Core/Free because its a fundamental primitive required in conjunction with using resource_group. It's extending an feature that is already in Core/Free .

Edited Dec 18, 2021 by Robin C. Ladiges

Possibility to enforce the execution order of jobs using resource_group

Problem to solve

Alternative solution

Intended users

Further details

Proposal

Process Flow (Today)

Process Flow (Tomorrow)

Example 1: Simple pipeline

When the resource group property is process_scope: waiting_for_resource and process_order: asc_by_pipeline_id.

When the resource group property is process_scope: created_or_waiting_for_resource and process_order: asc_by_pipeline_id.

When the resource group property is process_scope: created_or_waiting_for_resource and process_order: desc_by_pipeline_id.

Example 2: The same resource group is used in multiple jobs in the same pipeline

When the resource group property is process_scope: waiting_for_resource and process_order: asc_by_pipeline_id.

When the resource group property is process_scope: created_or_waiting_for_resource and process_order: asc_by_pipeline_id.

When the resource group property is process_scope: created_or_waiting_for_resource and process_order: desc_by_pipeline_id.

Example 3: Similar to Example 2 but deploy jobs are separate across different stages

When the resource group property is process_scope: waiting_for_resource and process_order: asc_by_pipeline_id.

When the resource group property is process_scope: created_or_waiting_for_resource and process_order: asc_by_pipeline_id.

When the resource group property is process_scope: created_or_waiting_for_resource and process_order: desc_by_pipeline_id.

Interface

Proposal

Documentation

Availability & Testing

What does success look like, and how can we measure that?

What is the type of buyer?

When the resource group property is `process_scope: waiting_for_resource` and `process_order: asc_by_pipeline_id`.

When the resource group property is `process_scope: created_or_waiting_for_resource` and `process_order: asc_by_pipeline_id`.

When the resource group property is `process_scope: created_or_waiting_for_resource` and `process_order: desc_by_pipeline_id`.

When the resource group property is `process_scope: waiting_for_resource` and `process_order: asc_by_pipeline_id`.

When the resource group property is `process_scope: created_or_waiting_for_resource` and `process_order: asc_by_pipeline_id`.

When the resource group property is `process_scope: created_or_waiting_for_resource` and `process_order: desc_by_pipeline_id`.

When the resource group property is `process_scope: waiting_for_resource` and `process_order: asc_by_pipeline_id`.

When the resource group property is `process_scope: created_or_waiting_for_resource` and `process_order: asc_by_pipeline_id`.

When the resource group property is `process_scope: created_or_waiting_for_resource` and `process_order: desc_by_pipeline_id`.