[15.10] Fix automatically-retried jobs stuck in pending state

Stan Hu requested to merge sh-fix-auto-retry-builds-15-10 into 15-10-stable-ee

What does this MR do and why?

This backports !116480 (merged) and !117275 (merged) to the 15-10-stable-ee branch.

This fixes an issue where concurrent runners would not pick up retried builds due to the runner tick value not being invalidated.

Previously when a job failed, Ci::RetryJobService would clone the failed job, add its own after_commit hooks, and immediately attempt to start the pipeline. However, starting a pipeline loads its own instances of Ci::Build, and for builds that are updated, the state changes add their own after_commit hooks.

For a given transaction, Rails only runs one after_commit hook for a specific model (see, so the after_commit hooks performed when starting the build would be discarded. As a result, BuildQueueWorker was never executed for the build, causing the runner tick value to be left in a stale state. Other run_after_commit blocks were not executed as well, such as build hooks.

To avoid this double after_commit business, move the starting of the pipeline into the run_after_commit block of the cloned job. This slightly delays the starting of the pipeline and the job, but it also avoids starting a Sidekiq worker inside a transaction (#398229 (closed)).

Relates to #387775 (closed)

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

  • This MR is backporting a bug fix, documentation update, or spec fix, previously merged in the default branch.
  • The original MR has been deployed to (not applicable for documentation or spec changes).
  • This MR has a severity label assigned (if applicable).
  • Ensure the e2e:package-and-test-ee job has either succeeded or been approved by a Software Engineer in Test.

Note to the merge request author and maintainer

The process of backporting bug fixes into stable branches is tracked as part of an internal pilot. If you have questions about this process, please:

