"Skip outdated deployment jobs" should evaluate the job execution when deployment starts
Release notes
Previously, in some cases, outdated jobs were still able to be manually executed or retried even when the skip outdated deployment jobs setting was enabled. We have updated the logic for this setting to now check a deployment when it starts, and if it is outdated due to a more recent deployment, it will not continue to proceed. This ensures that outdated deployments are not accidentally executed and overwrite more recent code changes in production.
Problem
This is one of the biggest issues of the Skip outdated deployment jobs feature.
Currently, skip-evaluation happens when a deployment finished, meaning old deployments can still be manually-executed/retried after the finish. We have to evaluate it when a deployment starts, and if it's outdated, it should transition to failed
state. (Keep in mind that technically we should transition to skipped
state)
The current logic was introduced in the issue three years ago and the shape was not good enough to be actually used in customer's project.
Proposal
We revisit the core logic of this feature. Provided that the feature is enabled in a project:
- The check requires Git operation (i.e. Gitaly call). We need to make sure that the process is cheap enough that doesn't cause performance degredation on GitLab CI.
- When a deployment job is about to run, we check if the deployment's SHA is behind the latest/current deployment's SHA.
- If it's behind, we fail the job. e.g.
running
=>failed
state transition. - If it's not behind, we don't do anything.
- If it's behind, we fail the job. e.g.
- If possible, we extend
state_machine
inCi::Build
as SSOT. If this's not acceptable, we extend the service classes to trigger the state transition, such asCi::ProcessBuildService
(for auto execution),Ci::PlayBuildService
(for manual execution) andCi::RetryJobService
(for auto/manual retry). - The current logic will be superseded by the new core logic. We should gradually rollout the new logic with a feature flag and phase out the old logic i.e.
DropOlderDeploymentsWorker
.
You can see a PoC MR to see the proposed code changes.