Fix Forward Deployment Worker causes deadlock
What does this MR do?
This MR fixes that ForwardDeploymentWorker
sometimes causes a deadlock. When a job status is updated, the associated deployment's status will also be updated. The order of UPDATE
must be executed in the right order, such as UPDATE ci_builds
=> UPDATE deployments
, in order to avoid deadlock. However, currently it looks like the following:
When a build is failed:
[51] pry(main)> b.drop!
(0.3ms) BEGIN
Deployment Update (0.5ms) UPDATE "deployments" SET "updated_at" = '2020-12-15 18:41:15.967273', "status" = 3,
"finished_at" = '2020-12-15 18:41:15.964140' WHERE "deployments"."id" = 34
Ci::Build Update (0.5ms) UPDATE "ci_builds" SET "status" = 'failed', "finished_at" = '2020-12-15 18:41:15.960797',
"updated_at" = '2020-12-15 18:41:15.970301', "processed" = FALSE, "lock_version" = 4 WHERE "ci_builds"."id" = 14
AND "ci_builds"."lock_version" = 3
(0.7ms) COMMIT
while when the build is skipped/canceled, it instead does:
[63] pry(main)> b.skip
(0.2ms) BEGIN
Ci::Build Update (0.7ms) UPDATE "ci_builds" SET "status" = 'skipped', "updated_at" = '2020-12-15 19:12:03.676005',
"processed" = FALSE, "lock_version" = 8 WHERE "ci_builds"."id" = 14 AND "ci_builds"."lock_version" = 7
Deployment Update (0.4ms) UPDATE "deployments" SET "updated_at" = '2020-12-15 19:12:03.680070', "status" = 4,
"finished_at" = '2020-12-15 19:12:03.679611' WHERE "deployments"."id" = 34
(0.7ms) COMMIT
To illustrate the problem:
job A fails -> lock CI record -> lock deployment record -> commit
|--deadlock--|
commit <- lock CI record <- lock deployment record <- job B gets skipped <- Job A fails
This MR fixes the wrong hook in Ci::Build
to execute the UPDATE deployments
after the Update builds
.
Huge thanks to @pbair for the initial investigation on the issue.
Related #297496 (closed) #297496 (closed)
Screenshots (strongly suggested)
Does this MR meet the acceptance criteria?
Conformity
-
📋 Does this MR need a changelog?-
I have included a changelog entry. -
I have not included a changelog entry because _____.
-
-
Documentation (if required) -
Code review guidelines -
Merge request performance guidelines -
Style guides -
Database guides -
Separation of EE specific content
Availability and Testing
-
Review and add/update tests for this feature/bug. Consider all test levels. See the Test Planning Process. -
Tested in all supported browsers -
Informed Infrastructure department of a default or new setting change, if applicable per definition of done
Security
If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:
-
Label as security and @ mention @gitlab-com/gl-security/appsec
-
The MR includes necessary changes to maintain consistency between UI, API, email, or other methods -
Security reports checked/validated by a reviewer from the AppSec team