Fix invalid cross-DB transaction when deleting project builds
Context originally from #339813 (closed) (and notes the comments #339813 (comment 669072215)
Summary
Super high level problem detail.
When deleting a project we:
- start a transaction
- delete the
ci_builds
rows - delete the
projects
row - commit the transaction
We cannot allow such use of transactions when ci_*
tables are moved to a separate database because the queries against the other database won't actually be in a transaction. This means that it could give the impression you are going to have rollback support etc. but it won't necessarily work and we don't believe it's a good idea to implement a 2 phase commit transaction approach across the 2 databases.
This is primarily caused by the use of dependent: :destroy
in:
Project has_many :builds, class_name: 'Ci::Build', inverse_of: :project, dependent: :destroy
Possible Solution
Read more at https://docs.gitlab.com/ee/development/database/multiple_databases.html#removing-cross-database-transactions
There is already re-architecture work that relates to this deletion code path in https://gitlab.com/gitlab-org/gitlab/-/issues/24644 . It's possible the solution to that may also solve this problem.
Other discussed solution
We recommend you find a different way that allows for deleting the builds in a robust way. Recommendation so far:
- Make
FastDestroyAll
accept an option to skip thedelete_all
step and therefore allow other things to take care of the deletion (e.g. real foreign key or "loose foreign key") - Make
FastDestroyAll
more robust so that we can use it forProject -> builds
(can you expand on what is needed here to make it robust) - Convert
ci_builds.project_id
to a "loose foreign key" and then useFastDestroyAll
forProject -> ci_builds
There may also be simpler options that just push most of ci_builds
cleanup work to Sidekiq or even just move the delete_all
logic after the transaction completes.
How to validate the problem is solved
We introduced a tool for raising errors in tests when these kinds of transaction problems are detected !67213 (merged) . You can wrap some piece of code in with_cross_database_modification_prevented
in a test to detect if that isolated piece of code has a problem. Alternatively you can use the ci_data_modification_check
tag on a spec to do the same thing but it covers the whole spec so it may detect extra things in factories or let
statements that make it harder to isolate your specific fix.