Create Deployment in separate transaction
Problem
Originally this problem was discovered in #341100 (closed).
Currently, when auto-retry happens in a pipeline, it executes two processes in a single transaction:
- Update the current job as
failed
status - Create a new job with copying the attributes from the previous job i.e.
Ci::RetryBuildService
Reference: https://gitlab.com/gitlab-org/gitlab/-/blob/master/app/models/ci/build.rb#L360-370
after_transition any => [:failed] do |build|
next unless build.project
if build.auto_retry_allowed?
begin
Ci::Build.retry(build, build.user)
rescue Gitlab::Access::AccessDeniedError => ex
Gitlab::AppLogger.error "Unable to auto-retry job #{build.id}: #{ex}"
end
end
end
This is intended for PipelineProcessWorker
not to accidentally mark the pipeline status as failed
, however, the RetryBuildService
is getting complicated and calling number of queries to PostgreSQL and Redis. According to the Kibana, we see around 100 PG queries and 35 redis calls, all happens in one transaction. This is subject of lock contention that potentially slows down the read/write from PostgreSQL or dead lock in the worst case. For example, we currently perform DB transaction -> Exclusive Lock -> DB transaction.
Sub-transaction
This issue effectively causes a sub-transaction on Seed::Environment#to_resource
, that is previously discussed in #341100 (closed). We should fix this root cause instead of patching the retry service.
Sharding-blocker
This issue also causes the sharding blocker issue.
Proposal
- We should move the
Ci::RetryBuildService
outside of the transaction. Ideally, sidekiq job. - Adjust
PipelineProcessWorker
to take auto-retry into account.