Skip to content

Create Deployment in separate transaction

Problem

Originally this problem was discovered in #341100 (closed).

Currently, when auto-retry happens in a pipeline, it executes two processes in a single transaction:

  1. Update the current job as failed status
  2. Create a new job with copying the attributes from the previous job i.e. Ci::RetryBuildService

Reference: https://gitlab.com/gitlab-org/gitlab/-/blob/master/app/models/ci/build.rb#L360-370

      after_transition any => [:failed] do |build|
        next unless build.project

        if build.auto_retry_allowed?
          begin
            Ci::Build.retry(build, build.user)
          rescue Gitlab::Access::AccessDeniedError => ex
            Gitlab::AppLogger.error "Unable to auto-retry job #{build.id}: #{ex}"
          end
        end
      end

This is intended for PipelineProcessWorker not to accidentally mark the pipeline status as failed, however, the RetryBuildService is getting complicated and calling number of queries to PostgreSQL and Redis. According to the Kibana, we see around 100 PG queries and 35 redis calls, all happens in one transaction. This is subject of lock contention that potentially slows down the read/write from PostgreSQL or dead lock in the worst case. For example, we currently perform DB transaction -> Exclusive Lock -> DB transaction.

Sub-transaction

This issue effectively causes a sub-transaction on Seed::Environment#to_resource, that is previously discussed in #341100 (closed). We should fix this root cause instead of patching the retry service.

Sharding-blocker

This issue also causes the sharding blocker issue.

Proposal

  1. We should move the Ci::RetryBuildService outside of the transaction. Ideally, sidekiq job.
  2. Adjust PipelineProcessWorker to take auto-retry into account.
Edited by Shinya Maeda