Limit pipeline job concurrency by resource_group
### Problem to Solve Some pipelines and/or jobs use unique resources or are in some way destructive to an environment. Being able to limit concurrency for them would allow users control over scenarios where there should only be one deploy at a time for an: - Environment - Entire Project - Job (perhaps due to shared testing infrastructure in a testing lab) ### Solution Implementing a generic semaphore for pipeline jobs would be our way to go. We will use `Resource groups` - each resource group is essentially a slot (currently limited to one). When multiple jobs need are scheduled to run, the first job locks this "slot" and the rest of the jobs need to wait for the lock to be released. This Resource group will be managed by the GitLab server, there is no development needed for the runner team. The user will need to configure the runner's setting to concurrency 1 for the lock to be complete. There can be multiple `Resource groups` per project, but each one can be run with concurrency 1. A good example for this is physical devices - so each device would be a resource group but only one job can run at any given time per device. ### Sample Configuration This example moves the lock to a job. Multiple pipelines can run simultaneously, but `jobA` will only ever run one at a time, across all pipelines in the project. ``` stages: - build jobA: resource_group: jobA stage: build script: - echo HelloA jobB: stage: build script: - echo HelloB ``` There are some useful patterns for `Resource Group`. * `resource_group: $CI_ENVIRONMENT_NAME` ... Limit per environment * `resource_group: $CI_JOB_NAME` ... Limit per job * `resource_group: $CI_COMMIT_REF_NAME:$CI_JOB_NAME` ... Limit per job per branch * `resource_group: $CI_COMMIT_REF_NAME:$CI_ENVIRONMENT_NAME` ... Limit per environment per branch (e.g. review apps) ## Proposal Implementing a generic semaphore for pipeline jobs would be our way to go. We will use `Resource groups` to define the lock. In addition the user needs to set the runner configuration to concurrency 1 for the solution to be complete. Only one job can run on a `Resource group` at any given time. Other jobs must wait for the `Resource group` to be unlocked before running. The entire logic will be managed by the GitLab server, runners will not need to change. The concurrency should be 1 by default for this iteration, and cannot be configured at this moment. It'd be a next iteration What will not be included in this iteration: limit forward deployments - we will not check the sequence of the pipelines - job b may run before job a even of job b depends on job a. This will be handled in https://gitlab.com/gitlab-org/gitlab/issues/25276 ## UX Proposal Purposed changes - When a job is waiting for a resource group, display an icon indicating this status wherever pipeline graphs are shown. The list can be seen [here](https://gitlab.com/gitlab-org/gitlab/issues/27927#note_215731494). - Hovering this icon should display a tooltip showing the following information: - Job name - Job status as waiting for resource ![MVC](/uploads/e2c096a4601f74813f0dcbca34a572db/MVC.png) ## Future Improvements ### Implicit locking for environments Because environments are much more often than not the kind of place where you'd want only one deployment to run at once, and always in the correct order, we will include implicit locking wherever `environment:` is used, using a semaphore with the name of the environment. 1. When `environment:` is used, it implies `Resource Group:`, so you don't need to specify `Resource Group:` and `environment:`, 1. When `environment:` is used, you can use `Resource Group: some-name` to create a lock across all environment deployments, 1. When implict lock is used, you can define `Resource Group: nil` to disable locking, thus run with full concurrency limit, 1. Implicit lock for the environment comes from the assumption that all deployments are by design not working very well when executed concurrently ### Different concurrency behaviors At the moment, all this will do is wait for a semaphore to free up. You could imagine more possibilities: ``` concurrency: parallel: Default current value, job is launch even if an other is in progress cancel : Cancel job if is launch in parallel of another wait: Wait previous job is finish for launch current skip: Skips job, if lock is already acquired ``` ### Pipeline-level lock This example will run only ever one of the project's pipeline's at once. The pipeline itself will run as normal, with all jobs running in parallel in the build stage. ``` Resource Group: $CI_PROJECT_NAME # Resource Group: $CI_ENVIRONMENT_NAME for example would give you a way to run one entire pipeline per environment stages: - build jobA: stage: build script: - echo HelloA jobB: stage: build script: - echo HelloB ``` ### Future UX Considerations In addition to seeing a job is waiting, a user may also want: - Resource group it is waiting for - Current job running in the resource group - An indicator on the job that is currently using the resource - Position of the job in the queue - Linking between the jobs to allow a user to navigate to them These may be accomplished with additional icons, and changes to the tooltip and/or adding this information to the job detail section. ## Links * https://buildkite.com/docs/pipelines/controlling-concurrency * https://jenkins.io/blog/2016/10/16/stage-lock-milestone/ ## Technical proposal TBD <!-- In this section, describe technical proposal for the problem to solve. It should give enough contexts to be able to be reviewed by domain/performance/security experts. --> ## Feature Flag This feature is implemented behind `ci_resource_group` feature flag and disabled by default. Once we've confirmed the feature is deemed stable, we remove the feature flag in order to publish the feature as GA. <!-- Read more [Feature flags in development of GitLab](https://docs.gitlab.com/ee/development/feature_flags/) --> ## Planned MRs ### Backend - [x] PoC https://gitlab.com/gitlab-org/gitlab/merge_requests/20450 - [x] [Ci Resouce Group models and parser](https://gitlab.com/gitlab-org/gitlab/merge_requests/20950) - [ ] [CI Resource Groups Status Transition](https://gitlab.com/gitlab-org/gitlab/merge_requests/20903) ### General - [x] [Write a feature spec to test frontend and backend change altogether](MR link if it already exists) - [ ] [Remove the feature flag and update documentation](https://gitlab.com/gitlab-org/gitlab/merge_requests/21617) # i.e. publish the feature
issue