Skip to content

Fix invalid cross-DB transaction when deleting project builds

Context originally from #339813 (closed) (and notes the comments #339813 (comment 669072215)

Summary

Super high level problem detail.

When deleting a project we:

  1. start a transaction
  2. delete the ci_builds rows
  3. delete the projects row
  4. commit the transaction

We cannot allow such use of transactions when ci_* tables are moved to a separate database because the queries against the other database won't actually be in a transaction. This means that it could give the impression you are going to have rollback support etc. but it won't necessarily work and we don't believe it's a good idea to implement a 2 phase commit transaction approach across the 2 databases.

This is primarily caused by the use of dependent: :destroy in:

Project has_many :builds, class_name: 'Ci::Build', inverse_of: :project, dependent: :destroy

Possible Solution

Read more at https://docs.gitlab.com/ee/development/database/multiple_databases.html#removing-cross-database-transactions

There is already re-architecture work that relates to this deletion code path in https://gitlab.com/gitlab-org/gitlab/-/issues/24644 . It's possible the solution to that may also solve this problem.

Other discussed solution

We recommend you find a different way that allows for deleting the builds in a robust way. Recommendation so far:

  1. Make FastDestroyAll accept an option to skip the delete_all step and therefore allow other things to take care of the deletion (e.g. real foreign key or "loose foreign key")
  2. Make FastDestroyAll more robust so that we can use it for Project -> builds (can you expand on what is needed here to make it robust)
  3. Convert ci_builds.project_id to a "loose foreign key" and then use FastDestroyAll for Project -> ci_builds

There may also be simpler options that just push most of ci_builds cleanup work to Sidekiq or even just move the delete_all logic after the transaction completes.

How to validate the problem is solved

We introduced a tool for raising errors in tests when these kinds of transaction problems are detected !67213 (merged) . You can wrap some piece of code in with_cross_database_modification_prevented in a test to detect if that isolated piece of code has a problem. Alternatively you can use the ci_data_modification_check tag on a spec to do the same thing but it covers the whole spec so it may detect extra things in factories or let statements that make it harder to isolate your specific fix.

Edited by Dylan Griffith