Skip to content

Fix Forward Deployment Worker causes deadlock

Shinya Maeda requested to merge fix-forward-deployment-worker-dead-lock into master

What does this MR do?

This MR fixes that ForwardDeploymentWorker sometimes causes a deadlock. When a job status is updated, the associated deployment's status will also be updated. The order of UPDATE must be executed in the right order, such as UPDATE ci_builds => UPDATE deployments, in order to avoid deadlock. However, currently it looks like the following:

When a build is failed:

[51] pry(main)> b.drop!
   (0.3ms)  BEGIN
  Deployment Update (0.5ms)  UPDATE "deployments" SET "updated_at" = '2020-12-15 18:41:15.967273', "status" = 3, 
         "finished_at" = '2020-12-15 18:41:15.964140' WHERE "deployments"."id" = 34
  Ci::Build Update (0.5ms)  UPDATE "ci_builds" SET "status" = 'failed', "finished_at" = '2020-12-15 18:41:15.960797', 
         "updated_at" = '2020-12-15 18:41:15.970301', "processed" = FALSE, "lock_version" = 4 WHERE "ci_builds"."id" = 14 
         AND "ci_builds"."lock_version" = 3
   (0.7ms)  COMMIT

while when the build is skipped/canceled, it instead does:

[63] pry(main)> b.skip
   (0.2ms)  BEGIN
  Ci::Build Update (0.7ms)  UPDATE "ci_builds" SET "status" = 'skipped', "updated_at" = '2020-12-15 19:12:03.676005', 
          "processed" = FALSE, "lock_version" = 8 WHERE "ci_builds"."id" = 14 AND "ci_builds"."lock_version" = 7
  Deployment Update (0.4ms)  UPDATE "deployments" SET "updated_at" = '2020-12-15 19:12:03.680070', "status" = 4, 
          "finished_at" = '2020-12-15 19:12:03.679611' WHERE "deployments"."id" = 34
   (0.7ms)  COMMIT

To illustrate the problem:

job A fails -> lock CI record -> lock deployment record -> commit
                       |--deadlock--|
     commit <- lock CI record <- lock deployment record <- job B gets skipped <- Job A fails

This MR fixes the wrong hook in Ci::Build to execute the UPDATE deployments after the Update builds.

Huge thanks to @pbair for the initial investigation on the issue.

Related #297496 (closed) #297496 (closed)

Screenshots (strongly suggested)

Does this MR meet the acceptance criteria?

Conformity

Availability and Testing

Security

If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:

  • Label as security and @ mention @gitlab-com/gl-security/appsec
  • The MR includes necessary changes to maintain consistency between UI, API, email, or other methods
  • Security reports checked/validated by a reviewer from the AppSec team
Edited by Shinya Maeda

Merge request reports