Start pipeline in after_commit callback when retrying jobs
What does this MR do and why?
This fixes an issue where concurrent runners would not pick up retried builds due to the runner tick value not being invalidated.
Previously when a job failed, Ci::RetryJobService
would clone the
failed job, add its own after_commit
hooks, and immediately attempt
to start the pipeline. However, starting a pipeline loads its own
instances of Ci::Build
, and for builds that are updated, the state
changes add their own after_commit
hooks.
For a given transaction, Rails only runs one after_commit
hook for a
specific model (see https://github.com/rails/rails/pull/45280), so the
after_commit
hooks performed when starting the build would be
discarded. As a result, BuildQueueWorker
was never executed for the
build, causing the runner tick value to be left in a stale
state. Other run_after_commit
blocks were not executed as well, such
as build hooks.
To avoid this double after_commit
business, move the starting of the
pipeline into the run_after_commit
block of the cloned job. This
slightly delays the starting of the pipeline and the job, but it also
avoids starting a Sidekiq worker inside a transaction
(#398229 (closed)).
Relates to #387775 (closed)
How to set up and validate locally
- Set up a runner with
concurrent = 2
. - Set up a project with a
.gitlab-ci.yml
:
image: busybox:latest
stages:
- test
test:
stage: test
script:
- exit 2
retry: 1
- Run a pipeline, watch it fail. The second job should remain in
pending.
- Enable the feature flag:
Feature.enable(:retry_job_start_pipeline_after_commit)
. - Run the pipeline again, and the job should be retried.
MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
I have evaluated the MR acceptance checklist for this MR.